Lecture 01
Course Introduction, Introduction to DL
1. STAT 453 Overview
This course introduces the foundations of machine learning (ML) and deep learning (DL), with a focus on generative models.
Objectives of the Course
- Provide a solid understanding of how machine learning works at both theoretical and practical levels.
- Cover classical ML methods (e.g., regression, decision trees) and modern deep learning methods (CNNs, RNNs, transformers).
- Introduce generative modeling, which is about teaching machines not only to recognize patterns but also to create new data (e.g., generating images, music, or text).
- Prepare students for real-world applications: from healthcare and finance to natural language processing and creative AI.
- Develop practical skills with Python libraries (NumPy, Pandas, PyTorch, Hugging Face).
Learning Outcomes
By the end of this course, you should be able to:
- Understand how to frame problems as ML tasks (supervised, unsupervised, reinforcement).
- Implement deep learning models for vision, audio, and text.
- Build systems capable of image generation, style transfer, and text analysis.
- Apply ML in research and projects, from predictive modeling to generative AI.
2. Inspiration: Example Projects from Prior Years
Past student projects demonstrate the breadth and creativity of ML applications:
- Healthcare: Breast cancer detection using ultrasound, Chest X-ray diagnosis.
- Finance: Stock price prediction using LSTMs.
- Art and Music: Surrealist artwork with GANs, LSTM music generation, song generation.
- AI Tools: Sentiment analysis with BERT, Gradio apps with self-supervised learning.
These examples show how ML can be applied across domains: predicting outcomes, generating creative content, analyzing language, or discovering insights in data.
3. About the Instructor
Ben Lengerich focuses on contextualized models, interpretability, and biomedical AI.
- Key Research Themes:
- Contextualized Models: Adapting foundation models (like LLMs) for specific domains such as medicine.
- Interpretability and Modularity: Developing neural architectures that remain transparent (e.g., Neural Additive Models).
- Biomedical Applications: Discovering treatment effectiveness, modeling cancer biology, drug discovery.
- Representative Papers:
- Neural Additive Models (NeurIPS 2021).
- RAG-IM for interpretable zero-shot clinical predictions (NeurIPS 2024 Workshop).
- Automated discovery of heterogeneous treatment effectiveness (JBI 2022).
4. The AI Revolution
- AI is currently experiencing a transformative moment, particularly since 2023–2024 with the rise of Generative AI (ChatGPT, Stable Diffusion, Midjourney).
- Key trends:
- Foundation Models: Pretrained on huge datasets, adaptable to many tasks.
- Diffusion Models: Generating hyper-realistic images.
- Large Language Models (LLMs): Powering conversational agents, coding assistants, and multimodal systems.
- The course contextualizes learning within this rapidly evolving landscape, preparing you to use these tools and also understand the theory behind them.
5. Course Logistics
- Textbook: Machine Learning with PyTorch and Scikit-Learn (Raschka et al., 2022).
- Serves as a reference guide, not exam material.
- Grading Breakdown:
- Homework: 20% (weekly, practical).
- Midterm: 20% (in-class, open-notes).
- Final Exam: 30% (comprehensive, open-notes).
- Final Project: 30% (team-based, creative).
- Extra credit: up to +5% via lecture notes.
- Late Policy: –10% per day late, up to 3 days –> then not accepted.
- Academic Integrity:
- AI tools allowed for assignments, but must reflect your own understanding.
- Plagiarism and misconduct lead to penalties.

6. What is Machine Learning?
Definition (Tom Mitchell, 1997)
A program is said to learn from experience E, with respect to a task T, and a performance measure P, if performance on T (as measured by P) improves with E.
Example:
- T: Spam detection.
- E: Emails labeled as “spam” or “not spam.”
- P: Accuracy of predictions. If accuracy improves as the system sees more labeled emails, it is learning.
7. Three Fundamental Questions in ML
-
Representation – How do we describe the world?
- E.g., using features, probability distributions, or graphs.
- Can be deterministic or probabilistic.
-
Inference – How do we answer questions using the model?
- E.g., computing probabilities using Bayes’ rule.
- Example: given symptoms, infer the probability of a disease.
-
Learning – How do we find the “best” model?
- Optimize parameters to maximize performance.
- Example: choosing the hypothesis that best fits training data.
Probabilistic View:
- Representation = joint distribution of all variables.
- Inference = computing conditional probabilities.
- Learning = constraining the hypothesis space for efficient optimization.

8. Broad Categories of ML
(a) Supervised Learning
- Uses labeled data (X, Y).
- Goal: learn function h: X –> Y.
- Examples:
- Regression: Predict house prices.
- Classification: Spam filtering, medical diagnosis.
- Performance: Measured by error rate, accuracy, MSE, etc.

(b) Unsupervised Learning
- Uses unlabeled data (X only).
- Goal: discover hidden structure.
- Examples:
- Dimensionality Reduction: PCA, autoencoders.
- Clustering: Group customers by shopping habits.
- Useful for exploration, compression, feature learning.

(c) Reinforcement Learning
- Agent interacts with environment.
- Learns a policy π: S –> A to maximize reward.
-
Examples:
- Game-playing agents (AlphaGo).
- Robotics control.
-
Drug discovery.
</figure>
(d) Semi-Supervised Learning
- Mix of labeled + unlabeled data.
- Example: Training with a few labeled medical images, plus many unlabeled ones.
(e) Self-Supervised Learning
- Labels are generated from the data itself.
- Example: Predicting missing words in a sentence (basis of modern LLM training).
9. The Supervised Learning Workflow
- Collect Data – Gather examples (features + labels).
- Preprocess and Feature Extraction – Convert raw data into meaningful features.
- Split Data – Into training, validation, and test sets.
- Train Model – Learn parameters from training data.
- Evaluate Model – Use test set to measure generalization.
- Deploy and Infer – Apply model to unseen data.
Golden Rule: Never train and test on the same data –> leads to overfitting.
10. Machine Learning vs Deep Learning
-
Traditional ML:
- Works best on structured/tabular data (databases, spreadsheets).
- Requires manual feature engineering.
- Examples: Logistic regression, Random Forests.
-
Deep Learning:
- Excels on unstructured data (images, text, audio).
- Learns representations automatically through layers of abstraction.
- Examples: CNNs, RNNs, Transformers.
11. Structured vs Unstructured Data
- Structured Data:
- Well-organized, rows and columns.
- Example: Customer transactions, census data.
- Unstructured Data:
- Raw and complex formats.
- Example: Images (pixel arrays), audio (waveforms), text (sequences).
- Why DL matters: Deep learning is designed to handle the complexity of unstructured data.
12. Core ML Jargon
- Training a Model: Fitting parameters using data.
- Training Example: One input-output pair (x, y).
- Feature: Input variable (predictor, attribute, covariate).
- Target: The label/output to predict (ground truth).
- Prediction: Model’s output for a given input.
13. Scientific Python Ecosystem
- Numpy: Numerical computations.
- Pandas: Data manipulation and tabular handling.
- Scikit-Learn: Classical ML algorithms.
- PyTorch: Deep learning framework.
- Matplotlib/Seaborn: Visualization.
- Hugging Face Transformers: Pre-trained models for NLP and beyond.
14. Further Resources
- Blog: Intro to Deep Learning – by Sebastian Raschka.
- STAT451 lecture notes: GitHub Repo.
- Book: Python Machine Learning (Raschka, 2019, 3rd Ed.).