CSCI 699: Privacy-Preserving Machine Learning

Instructor: Sai Praneeth Karimireddy (karimire@usc.edu)    Location: WPH 102    Time: Fri 1-4:20 pm

Course Description and Objectives

This course focuses on the foundations of privacy-preserving machine learning. Extremely personal data is being collected at an unprecedented scale by ML companies. While training ML models on such confidential data can be highly beneficial, it also comes with huge privacy risks. This course addresses the dual challenge of maximizing the utility of machine learning models while protecting individual privacy. We will cover the following topics: differential privacy; private training of ML models; privacy attacks and audits; federated and decentralized machine learning.

This course will prepare you to rigorously identify, reason about, and manage privacy risks in machine learning. You will learn to design algorithms that protect sensitive information, and to analyze the privacy leakage of any ML system. Additionally, the course will introduce you to cutting-edge research and practical applications. By the end of the course, you will be well-equipped to undertake research and address real-world privacy challenges in machine learning.

For providing anonymous feedback at any point in the course, please use this anonymous form.

Prerequisites

While there are no official prerequisites, knowledge of advanced probability (at the level of MATH 505a), linear algebra and multi-variable calculus (at the level of MATH 225), analysis of algorithms (at the level of CSCI 570), introductory statistics and hypothesis testing (at the level of MATH 308), and machine learning (at the level of CSCI 567) is recommended.

Syllabus

Week Topics/Daily Activities Additional Readings Deliverables
Week 1 Theory: Introduction to anonymity and data privacy; Data anonymization techniques; De-anonymization attacks; Linkage and Reconstruction attacks.
Practical: Implement some linkage attacks (bring laptop).
Lab-1a (solution)
Lab-1b (solution)
Week 1 slides
Week 2 Theory: Differential Privacy; Randomized response; Laplace mechanism; Hypothesis testing interpretation.
Annotated week 2 slides
Homework 1 (due Sep 20 on Brightspace)
Week 3 Theory: ML training; gradient descent; SGD.
Week 3 slides
Week 4 Theory: Private ML training; DP-SGD; Gaussian DP; Sub-sampling; Composition.
Practical: Opacus Library for private deep learning (bring laptop).
HW 1 due before class.
Week 4 slides
Annotated slides (due Sep 27)
Lab 3
HW 2 (due Sep 27)
HW practical
Week 5 Theory: Practical Privacy auditing; Designing powerful membership inference attacks; Measuring the influence of training data.
Presentations
HW 2 due before class.
Week 5 slides
Annotated slides
Week 6 Theory: Privacy in LLMs; RLHF/prompt engineering for privacy; Data stealing attacks; private in-context learning.
Week 6 slides
Annotated slides
Fall break
Week 7 Theory: Unlearning algorithms; guarantees; Model editing and correcting.
Practical: Implement unlearning (bring laptop).
Decide project topic.
HW 3 on DP auditing practical out
Week 8 Theory: Decentralized privacy, Local DP
Confidential Computing: Guest lecture by Mengyuan Li
Practical: Comparing local vs. central DP (bring laptop).
HW 3 due
Week 9 Theory: Federated learning; challenges due to data heterogeneity, communication compression; Privacy attacks in FL.
Practical: Federated learning on hospital data (bring laptop).
Week 10 Theory: Privacy in FL; Secure aggregation; Quantized DP.
Practical: DPFL vs. Local DP (bring laptop).
Week 11 Theory: Privacy in Practice; Incentives; Relation to Copyright law.
Weeks 12-15 Student presentations
In-class presentations. Option to schedule earlier in the semester.
Final Final project report Report due on the university-scheduled date of the final exam.

Grading

Resources

There are no required textbooks. The following writeups are excellent supplemental readings and may be used as references.

This course builds on several related courses which can serve as valuable additional references: