Lecture 01

Course Introduction, Introduction to DL

1. STAT 453 Overview

This course introduces the foundations of machine learning (ML) and deep learning (DL), with a focus on generative models.

Objectives of the Course

Learning Outcomes

By the end of this course, you should be able to:


2. Inspiration: Example Projects from Prior Years

Past student projects demonstrate the breadth and creativity of ML applications:

These examples show how ML can be applied across domains: predicting outcomes, generating creative content, analyzing language, or discovering insights in data.


3. About the Instructor

Ben Lengerich focuses on contextualized models, interpretability, and biomedical AI.


4. The AI Revolution


5. Course Logistics


6. What is Machine Learning?

Definition (Tom Mitchell, 1997)

A program is said to learn from experience E, with respect to a task T, and a performance measure P, if performance on T (as measured by P) improves with E.

Example:


7. Three Fundamental Questions in ML

  1. Representation – How do we describe the world?

    • E.g., using features, probability distributions, or graphs.
    • Can be deterministic or probabilistic.
  2. Inference – How do we answer questions using the model?

    • E.g., computing probabilities using Bayes’ rule.
    • Example: given symptoms, infer the probability of a disease.
  3. Learning – How do we find the “best” model?

    • Optimize parameters to maximize performance.
    • Example: choosing the hypothesis that best fits training data.

Probabilistic View:


8. Broad Categories of ML

(a) Supervised Learning

(b) Unsupervised Learning

(c) Reinforcement Learning

(d) Semi-Supervised Learning

(e) Self-Supervised Learning


9. The Supervised Learning Workflow

  1. Collect Data – Gather examples (features + labels).
  2. Preprocess and Feature Extraction – Convert raw data into meaningful features.
  3. Split Data – Into training, validation, and test sets.
  4. Train Model – Learn parameters from training data.
  5. Evaluate Model – Use test set to measure generalization.
  6. Deploy and Infer – Apply model to unseen data.

Golden Rule: Never train and test on the same data –> leads to overfitting.


10. Machine Learning vs Deep Learning


11. Structured vs Unstructured Data


12. Core ML Jargon


13. Scientific Python Ecosystem


14. Further Resources