CSCI 599: Optimization for Machine Learning

Instructor: Sai Praneeth Karimireddy (karimire@usc.edu)    Location: DMC 101    Time: Mon, Wed 4pm to 5:50 pm

Course Description and Objectives

Fast optimization algorithms which scale to massive datasets have been powering the rapid progress in machine learning. We will learn the nuts and bolts of these algorithms and establishes their theoretical foundations, with a particular focus on the challenges arising in modern large-scale ML. Some of the questions we will answer:

Note: this course is heavy on theory, though plenty of practical parts are mixed in.

Objectives: The course will prepare you to view machine learning through the formalism of optimization. At the end of the course, you will be able to analyze optimization algorithms and implement them. You will learn the plethora of algorithms used in modern ML, and understand the tradeoffs each of them is designed to achieve.

Prerequisites

While there are no official prerequisites, knowledge of Probability (at the level of MATH 505a), Linear Algebra, Multi-Variable Calculus (at the level of MATH 225), Analysis of Algorithms (at the level of CSCI 570), and Machine Learning (at the level of CSCI 567) is recommended.

Grading

  1. Final Exam (50%): Closed book exam consisting of theoretical questions similar to exercises. You are allowed to bring one cheat sheet (A4 size paper, both sides can be used).
  2. Team Course Project (40%): The course includes a major project where you will present and reproduce a paper on optimization for machine learning in teams of 2-3.
    • Presentation (15%): You will closely read a paper related to the course and make a 20-minute presentation. See this note for advice by Yiling Chen and evaluation criteria.
    • Report (25%): You will then reproduce the main experiment/theorem of the paper. If it is a theoretical paper, you will rederive and present the proof of the main claims as you understand them. If it is an experimental paper, you will follow the ML reproducibility challenge where you will attempt to reproduce the experiment in the paper.
  3. Discussion and participation in exercise sessions (10%): There will be a weekly exercise session consisting of a mix of theoretical and practical Python exercises for each corresponding topic. Coming prepared and actively participating in the exercise session discussions counts for 10% of the grade.

Resources

We will mainly use the course lecture notes adapted from Martin Jaggi and Nicolas Flammarion.

Additional resources which you will likely find useful:

There are several related courses and materials which could be useful supplementary references:

Course Schedule

Week Topics/Daily Activities Recommended Reading Deliverables
Week 1
  • Lecture:
    • Course overview and objectives
    • Convex sets and convex functions
    • Fenchel conjugate and optimality conditions
  • Exercise Session:
    • Getting started with numpy programming
Chapter 1 of lecture notes Exercise 0 released
Week 2
  • Lecture:
    • Smoothness and Strong Convexity
    • Gradient descent and its convergence properties
  • Exercise Session:
    • Theoretical problems on properties of convex sets and functions
Chapter 2 of lecture notes Exercise 1 released
Week 3
  • Lecture:
    • Subgradient descent
    • Proximal Gradient Descent
  • Exercise Session:
    • Implementing gradient descent
    • Theoretical problems from chapter 2
Exercise 2 released
Week 4
  • Lecture:
    • Stochastic Gradient Descent (SGD)
    • Non-convexity and local optimality
    • Convergence to critical points
  • Exercise Session:
    • Optimizing Lasso
    • Theoretical problems from chapters 3 & 4
Exercise 3 released
Week 5
  • Lecture:
    • Nesterov Acceleration
    • Momentum as smoothing non-convexity
  • Exercise Session:
    • Comparing mini-batch and full batch methods
Exercise 4 released
Week 6
  • Lecture:
    • Newton’s method
    • Adaptive preconditioning: AdaGrad, Adam
    • (L0, L1)-smoothness and clipped SGD
  • Exercise Session:
    • Understanding momentum methods in practice
Exercise 5 released
Week 7
  • Lecture:
    • Variational inequalities
    • Gradient-descent-ascent
    • Extragradient method
    • Optimistic gradient method
  • Exercise Session:
    • Comparative analysis of SGD and adaptive methods
  • Project paper selection due date.
  • Exercise 6 released.
Week 8
  • Lecture:
    • Linear neural networks
    • Gradient flows
  • Exercise Session:
    • Adversarial robust training
    • Discuss projects
Exercise 7 released
Material below will not be part of exam unless explicitly stated otherwise.
Week 9
  • Lecture:
    • Neural Tangent Kernel
    • Feature learning in neural networks
  • Exercise Session:
    • Visualizing feature learning in neural networks
Bach and Chizat. "Gradient descent on infinitely wide neural networks: Global convergence and generalization." 2021
Week 10
  • Lecture:
    • Why does pretraining help?
    • When it harms: Feature distortion in fine-tuning
  • Exercise Session:
    • Discuss projects
Kumar et al. "Fine-tuning can distort pretrained features and underperform out-of-distribution." ICLR 2022
Week 11
  • Lecture:
    • Overview of LLM training
    • Zeroth-order (memory-efficient) optimization methods
    • RLHF and Direct Preference Optimization
  • Exercise Session:
    • Discuss projects
Rafailov et al. "Direct preference optimization: Your language model is secretly a reward model." NeurIPS 2024
Week 12
  • Lecture:
    • Heterogeneity in federated optimization
    • Communication compression
    • Byzantine robustness
  • Exercise Session:
    • Simulating federated learning environments
    • Discuss projects
Weeks 13-15
  • Student presentations:
    • In-class presentations
    • Option to schedule earlier in the semester
Final Final exam Project report due.
Exam on university-scheduled date.