Lecture 02

A Brief History of Deep Learning

Course: STAT 453 — Introduction to Deep Learning and Generative Models
Lecture: 02 — A Brief History of Deep Learning
Lecturer: Ben Lengerich
Notes prepared and improved by: Abel Zewdie


Reminders


Recap: What Is Machine Learning?

Formally, a computer program is said to learn from experience
( \mathcal{E} ), with respect to some task ( \mathcal{T} ) and performance measure ( \mathcal{P} ), if its performance at ( \mathcal{T} ), as measured by ( \mathcal{P} ), improves with ( \mathcal{E} ).

Definition of machine learning

Data Representation

Structured Data

Unstructured Data


Machine Learning Jargon


History of Machine Learning

Timeline of machine learning methods
Historical trends in neural networks

Artificial Neurons and Perceptrons

Neural computation models were first discussed in 1943 by McCulloch and Pitts.

McCulloch-Pitts neuron model

Perceptrons

Perceptron model

Assume:

[ Y \sim \mathcal{N}(f(x), \Sigma^2) ]

Then maximizing likelihood is equivalent to minimizing squared error:

[ \arg\min_w \sum_i \frac{1}{2}(y_i - f(x_i; w))^2 ]

Weight update rule:

[ w_d = w_d + \eta \sum_i (y_i - o_i)\, o_i(1 - o_i)\, x_d^i ]


Can a Perceptron Represent XOR?

Answer: No.

Assume weights ( w_1, w_2 ) exist such that:

This leads to a contradiction when all XOR cases are considered.

Conclusion: XOR is not linearly separable, so a single-layer perceptron cannot represent it.


Backpropagation

Neural Networks as Computation Graphs

Neural networks can be viewed as compositions of functions represented as computation graphs.

Computation graph

Using the chain rule and working backward:

[ \frac{\partial f_n}{\partial x} = \sum_{i \in \pi(n)} \frac{\partial f_n}{\partial f_i} \frac{\partial f_i}{\partial x} ]


About the Term “Deep Learning”

“Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep learning methods are representation-learning methods with multiple levels of representation.”

— LeCun, Bengio, & Hinton (2015)


Activation Functions

Activation functions

Hardware

CPU vs GPU

CPU

GPU


Large-Scale Unsupervised Learning

From GPT-1 (2018) to GPT-4):


Open Directions