Lecture 12

Improving Optimization

Today’s Topics:

[1. Midterm]

[2. Learning Rate Decay:]

[3. Learning Rate Schedulers in PyTorch:]

[4. Training with “Momentum”:]

[5. ADAM: Adaptive Learning Rates]

[6. Optimization Algorithms in PyTorch]


1. Midterm

Content:


2. Learning Rate Decay:

Figure 1
Figure 1. Visualization shown during Lecture 12.
Figure 1
Figure 1. Visualization shown during Lecture 12.

Common Forms of Learning Rate Decay

Type Formula Notes
Exponential Decay $\eta_t = \eta_0 e^{-k t}$ Smooth continuous decay; $k$ is rate hyperparameter
Step (Halving) $\eta_t = \eta_{t-1}/2$ every $T_0$ epochs Simple, common in classification
Inverse Decay $\eta_t = \frac{\eta_0}{1 + k t}$ Slower decay, often more stable

⚖️ Key idea: Decay reduces update magnitude so the model “settles” near the minimum instead of bouncing around.



3. Learning Rate Schedulers in PyTorch:


4. Training with “Momentum”:

Figure 1
Figure 1. Visualization shown during Lecture 12.

5. ADAM: Adaptive Learning Rates


6. Optimization Algorithms in PyTorch