Course Description and Objectives
Fast optimization algorithms which scale to massive datasets have been powering the rapid progress in machine learning. We will learn the nuts and bolts of these algorithms and establishes their theoretical foundations, with a particular focus on the challenges arising in modern large-scale ML. Some of the questions we will answer:
- what is convexity and why is it useful (convex optimization)
- why can we train using only mini-batches (stochastic optimization)
- why Adam is typically preferred over SGD (preconditioning and adaptivity)
- how to train robust ML models (min-max optimization)
- explaining feature learning in neural networks (non-convex optimization)
- how to train privately on distributed datasets (federated optimization)
Objectives: The course will prepare you to view machine learning through the formalism of optimization. At the end of the course, you will be able to analyze optimization algorithms and implement them. You will learn the plethora of algorithms used in modern ML, and understand the tradeoffs each of them is designed to achieve.
Prerequisites
While there are no official prerequisites, knowledge of Probability (at the level of MATH 505a), Linear Algebra, Multi-Variable Calculus (at the level of MATH 225), Analysis of Algorithms (at the level of CSCI 570), and Machine Learning (at the level of CSCI 567) is recommended.
Grading
- Final Exam (50%): Closed book exam consisting of theoretical questions similar to exercises. You are allowed to bring one cheat sheet (A4 size paper, both sides can be used).
- Team Course Project (40%): The course includes a major project where you will present and reproduce a paper on optimization for machine learning in teams of 2-3.
- Presentation (15%): You will closely read a paper related to the course and make a 20-minute presentation. See this note for advice by Yiling Chen and evaluation criteria.
- Report (25%): You will then reproduce the main experiment/theorem of the paper. If it is a theoretical paper, you will rederive and present the proof of the main claims as you understand them. If it is an experimental paper, you will follow the ML reproducibility challenge where you will attempt to reproduce the experiment in the paper.
- Discussion and participation in exercise sessions (10%): There will be a weekly exercise session consisting of a mix of theoretical and practical Python exercises for each corresponding topic. Coming prepared and actively participating in the exercise session discussions counts for 10% of the grade.
Resources
We will mainly use the course lecture notes adapted from Martin Jaggi and Nicolas Flammarion.
Additional resources which you will likely find useful:- Bubeck, Sébastien. "Convex Optimization: Algorithms and Complexity." Foundations and Trends® in Machine Learning, 8.3-4 (2015): 231-357. [link]
- Bach, Francis. "Learning Theory from First Principles." MIT Press, 2024. [link]
- Google's Handbook on Practical Deep Learning. [link]
There are several related courses and materials which could be useful supplementary references:
Course Schedule
Week | Topics/Daily Activities | Recommended Reading | Deliverables |
---|---|---|---|
Week 1 |
|
Chapter 1 of lecture notes | Exercise 0 released |
Week 2 |
|
Chapter 2 of lecture notes | Exercise 1 released |
Week 3 |
|
Exercise 2 released | |
Week 4 |
|
Exercise 3 released | |
Week 5 |
|
|
Exercise 4 released |
Week 6 |
|
Exercise 5 released | |
Week 7 |
|
|
|
Week 8 |
|
Exercise 7 released | |
Material below will not be part of exam unless explicitly stated otherwise. | |||
Week 9 |
|
Bach and Chizat. "Gradient descent on infinitely wide neural networks: Global convergence and generalization." 2021 | |
Week 10 |
|
Kumar et al. "Fine-tuning can distort pretrained features and underperform out-of-distribution." ICLR 2022 | |
Week 11 |
|
Rafailov et al. "Direct preference optimization: Your language model is secretly a reward model." NeurIPS 2024 | |
Week 12 |
|
||
Weeks 13-15 |
|
||
Final | Final exam | Project report due. Exam on university-scheduled date. |