CSCI 699: Privacy-Preserving Machine Learning

Instructor: Sai Praneeth Karimireddy   Class: Mon 4-7:20 pm (SGM 226)   Office hours: Wed 5:30-7pm (reserve slot), location: GCS Lower Level 2, SB 4

Course Description and Objectives

This course focuses on the foundations of privacy-preserving machine learning. Extremely personal data is being collected at an unprecedented scale by ML companies. While training ML models on such confidential data can be highly beneficial, it also comes with huge privacy risks. This course addresses the dual challenge of maximizing the utility of machine learning models while protecting individual privacy. We will cover the following topics: differential privacy; private training of ML models; privacy attacks and audits; federated and decentralized machine learning.

This course will prepare you to rigorously identify, reason about, and manage privacy risks in machine learning. You will learn to design algorithms that protect sensitive information, and to analyze the privacy leakage of any ML system. Additionally, the course will introduce you to cutting-edge research and practical applications. By the end of the course, you will be well-equipped to undertake research and address real-world privacy challenges in machine learning.

For providing anonymous feedback at any point in the course, please use this anonymous form.

Grading

Prerequisites

While there are no official prerequisites, knowledge of advanced probability (at the level of MATH 505a), linear algebra and multi-variable calculus (at the level of MATH 225), analysis of algorithms (at the level of CSCI 570), introductory statistics and hypothesis testing (at the level of MATH 308), and machine learning (at the level of CSCI 567) is recommended.

Syllabus

Week Date Lecture Presentation Items Due Lecture Material
1Aug 25
  • Course logistics
  • Why privacy
  • Attempts at privacy
  • Linkage attacks
  • Differential privacy
Sep 1Labor Day
2Sep 8
  • Hypothesis testing
  • Laplace mechanism
  • Properties of DP
  • Gaussian mechanism
To be uploaded
3Sep 15
  • Approximate DP
  • Advanced composition
  • Gradient descent (GD)
  • DP-GD
  • SGD
  • DP-SGD
  • f-DP
HW 1 due To be uploaded
4Sep 22
  • f-DP
  • Gaussian DP
  • Privacy auditing
  • HW 2 due
To be uploaded
5Sep 29
  • Membership inference attacks
  • Privacy auditing
To be uploaded
6Oct 6
  • Copyright
  • Memorization
  • Watermarking
  • HW 3 due
  • Project topic finalized
To be uploaded
7Oct 13
  • Data attribution
  • Reconstruction attacks
  • LIRA membership inference attacks
To be uploaded
8Oct 20
  • Unlearning
  • Measuring memorization
To be uploaded
9Oct 27
  • Data attribution and watermarking
To be uploaded
10Nov 3
  • Unlearning
To be uploaded
11Nov 10
  • Local DP
  • Decentralized privacy
  • Federated learning
  • Privacy in LLMs
To be uploaded
12Nov 17
  • Sanitization approaches
  • Prompt defenses
  • Contextual integrity
To be uploaded
13Nov 24
  • Local DP
  • Decentralized privacy
To be uploaded
14Dec 1
  • Federated privacy & law
To be uploaded
Dec 8Study break
Dec 15
  • Project report due

Resources

There are no required textbooks. The following writeups are excellent supplemental readings and may be used as references.

This course builds on several related courses which can serve as valuable additional references: