• Bias Term

  • Back Propagation

  • Vanishing and Exploding Gradients

  • NN Weight Initialization

  • What are the differences between DNN and Logistic Regression?

  • Hyper-Parameter Tuning (Random Search, Grid Search)

  • Prevent Overfitting

  • Dropout

  • Batch Norm and Layer Norm

  • Learning Rate

  • Plateau and Saddle Point

  • Transfer Learning

  • Activation Functions (sigmoid, tanh, RELU, leaky RELU, maxout, elu)

  • Why Non-Linear Activation Functions?

  • Optimizers (SGD, RMSprop, Momentum, Adagrad, Adam, AdamW)

  • Batch GD and SGD

  • Original Self Attention