Lecture 01

Course Introduction, Introduction to DL

1. STAT 453 Overview

This course introduces the foundations of machine learning (ML) and deep learning (DL), with a focus on generative models.

Objectives of the Course

Provide a solid understanding of how machine learning works at both theoretical and practical levels.
Cover classical ML methods (e.g., regression, decision trees) and modern deep learning methods (CNNs, RNNs, transformers).
Introduce generative modeling, which is about teaching machines not only to recognize patterns but also to create new data (e.g., generating images, music, or text).
Prepare students for real-world applications: from healthcare and finance to natural language processing and creative AI.
Develop practical skills with Python libraries (NumPy, Pandas, PyTorch, Hugging Face).

Learning Outcomes

By the end of this course, you should be able to:

Understand how to frame problems as ML tasks (supervised, unsupervised, reinforcement).
Implement deep learning models for vision, audio, and text.
Build systems capable of image generation, style transfer, and text analysis.
Apply ML in research and projects, from predictive modeling to generative AI.

2. Inspiration: Example Projects from Prior Years

Past student projects demonstrate the breadth and creativity of ML applications:

Healthcare: Breast cancer detection using ultrasound, Chest X-ray diagnosis.
Finance: Stock price prediction using LSTMs.
Art and Music: Surrealist artwork with GANs, LSTM music generation, song generation.
AI Tools: Sentiment analysis with BERT, Gradio apps with self-supervised learning.

These examples show how ML can be applied across domains: predicting outcomes, generating creative content, analyzing language, or discovering insights in data.

3. About the Instructor

Ben Lengerich focuses on contextualized models, interpretability, and biomedical AI.

Key Research Themes:
- Contextualized Models: Adapting foundation models (like LLMs) for specific domains such as medicine.
- Interpretability and Modularity: Developing neural architectures that remain transparent (e.g., Neural Additive Models).
- Biomedical Applications: Discovering treatment effectiveness, modeling cancer biology, drug discovery.
Representative Papers:
- Neural Additive Models (NeurIPS 2021).
- RAG-IM for interpretable zero-shot clinical predictions (NeurIPS 2024 Workshop).
- Automated discovery of heterogeneous treatment effectiveness (JBI 2022).

4. The AI Revolution

AI is currently experiencing a transformative moment, particularly since 2023–2024 with the rise of Generative AI (ChatGPT, Stable Diffusion, Midjourney).
Key trends:
- Foundation Models: Pretrained on huge datasets, adaptable to many tasks.
- Diffusion Models: Generating hyper-realistic images.
- Large Language Models (LLMs): Powering conversational agents, coding assistants, and multimodal systems.
The course contextualizes learning within this rapidly evolving landscape, preparing you to use these tools and also understand the theory behind them.

5. Course Logistics

Textbook: Machine Learning with PyTorch and Scikit-Learn (Raschka et al., 2022).
- Serves as a reference guide, not exam material.
Grading Breakdown:
- Homework: 20% (weekly, practical).
- Midterm: 20% (in-class, open-notes).
- Final Exam: 30% (comprehensive, open-notes).
- Final Project: 30% (team-based, creative).
- Extra credit: up to +5% via lecture notes.
Late Policy: –10% per day late, up to 3 days –> then not accepted.
Academic Integrity:
- AI tools allowed for assignments, but must reflect your own understanding.
- Plagiarism and misconduct lead to penalties.

6. What is Machine Learning?

Definition (Tom Mitchell, 1997)

A program is said to learn from experience E, with respect to a task T, and a performance measure P, if performance on T (as measured by P) improves with E.

Example:

T: Spam detection.
E: Emails labeled as “spam” or “not spam.”
P: Accuracy of predictions. If accuracy improves as the system sees more labeled emails, it is learning.

7. Three Fundamental Questions in ML

Representation – How do we describe the world?
- E.g., using features, probability distributions, or graphs.
- Can be deterministic or probabilistic.
Inference – How do we answer questions using the model?
- E.g., computing probabilities using Bayes’ rule.
- Example: given symptoms, infer the probability of a disease.
Learning – How do we find the “best” model?
- Optimize parameters to maximize performance.
- Example: choosing the hypothesis that best fits training data.

Probabilistic View:

Representation = joint distribution of all variables.
Inference = computing conditional probabilities.
Learning = constraining the hypothesis space for efficient optimization.

8. Broad Categories of ML

(a) Supervised Learning

Uses labeled data (X, Y).
Goal: learn function h: X –> Y.
Examples:
- Regression: Predict house prices.
- Classification: Spam filtering, medical diagnosis.
Performance: Measured by error rate, accuracy, MSE, etc.

(b) Unsupervised Learning

Uses unlabeled data (X only).
Goal: discover hidden structure.
Examples:
- Dimensionality Reduction: PCA, autoencoders.
- Clustering: Group customers by shopping habits.
Useful for exploration, compression, feature learning.

(c) Reinforcement Learning

Agent interacts with environment.
Learns a policy π: S –> A to maximize reward.
Examples:
- Game-playing agents (AlphaGo).
- Robotics control.
- Drug discovery.
</figure>

(d) Semi-Supervised Learning

Mix of labeled + unlabeled data.
Example: Training with a few labeled medical images, plus many unlabeled ones.

(e) Self-Supervised Learning

Labels are generated from the data itself.
Example: Predicting missing words in a sentence (basis of modern LLM training).

9. The Supervised Learning Workflow

Collect Data – Gather examples (features + labels).
Preprocess and Feature Extraction – Convert raw data into meaningful features.
Split Data – Into training, validation, and test sets.
Train Model – Learn parameters from training data.
Evaluate Model – Use test set to measure generalization.
Deploy and Infer – Apply model to unseen data.

Golden Rule: Never train and test on the same data –> leads to overfitting.

10. Machine Learning vs Deep Learning

Traditional ML:
- Works best on structured/tabular data (databases, spreadsheets).
- Requires manual feature engineering.
- Examples: Logistic regression, Random Forests.
Deep Learning:
- Excels on unstructured data (images, text, audio).
- Learns representations automatically through layers of abstraction.
- Examples: CNNs, RNNs, Transformers.

11. Structured vs Unstructured Data

Structured Data:
- Well-organized, rows and columns.
- Example: Customer transactions, census data.
Unstructured Data:
- Raw and complex formats.
- Example: Images (pixel arrays), audio (waveforms), text (sequences).
Why DL matters: Deep learning is designed to handle the complexity of unstructured data.

12. Core ML Jargon

Training a Model: Fitting parameters using data.
Training Example: One input-output pair (x, y).
Feature: Input variable (predictor, attribute, covariate).
Target: The label/output to predict (ground truth).
Prediction: Model’s output for a given input.

13. Scientific Python Ecosystem

Numpy: Numerical computations.
Pandas: Data manipulation and tabular handling.
Scikit-Learn: Classical ML algorithms.
PyTorch: Deep learning framework.
Matplotlib/Seaborn: Visualization.
Hugging Face Transformers: Pre-trained models for NLP and beyond.

14. Further Resources

Blog: Intro to Deep Learning – by Sebastian Raschka.
STAT451 lecture notes: GitHub Repo.
Book: Python Machine Learning (Raschka, 2019, 3rd Ed.).