CSCI 699: Privacy-Preserving Machine Learning

Course Description and Objectives

This course focuses on the foundations of privacy-preserving machine learning. Extremely personal data is being collected at an unprecedented scale by ML companies. While training ML models on such confidential data can be highly beneficial, it also comes with huge privacy risks. This course addresses the dual challenge of maximizing the utility of machine learning models while protecting individual privacy. We will cover the following topics: differential privacy; private training of ML models; privacy attacks and audits; federated and decentralized machine learning.

This course will prepare you to rigorously identify, reason about, and manage privacy risks in machine learning. You will learn to design algorithms that protect sensitive information, and to analyze the privacy leakage of any ML system. Additionally, the course will introduce you to cutting-edge research and practical applications. By the end of the course, you will be well-equipped to undertake research and address real-world privacy challenges in machine learning.

For providing anonymous feedback at any point in the course, please use this anonymous form.

Grading

Assignments (30%)
- 3 assignments in the first half of semester (submitted on Brightspace)
- Short conceptual checks + practical components
- Goal: ensure understanding of core concepts and let you “play” with them
Project Report (35%) — due exam day
- Option 1: Paper Reading
  - Team up with 1–3 others working on related papers
  - Teach each other your papers and background
  - Replicate core experiments from state-of-the-art
  - Submit a 4-page report
- Option 2: Research (encouraged)
  - Teams of 1–3
  - Develop your own research question (from class readings or otherwise)
  - Meet with instructor before Oct 6 (Fall break) for feedback
  - Submit a 4-page report
Paper Reading & Discussion (35%)
- Uses a role-playing discussion format
- Each week post-Fall break, we will discuss 2–3 papers
- Roles:
  - Presenter*: present the paper (in-class presentation, 20% of total grade)
  - Antagonist: identify flaws, missing experiments
  - Archaeologist: situate the paper in the broader field
  - Researcher: propose a follow-up abstract
  - Practitioner: pitch how to turn it into a product
- Each student rotates through all roles equally
- Non-presenters: submit a 1-paragraph role write-up before class (on Brightspace), then join in-class discussion (15%)

Prerequisites

While there are no official prerequisites, knowledge of advanced probability (at the level of MATH 505a), linear algebra and multi-variable calculus (at the level of MATH 225), analysis of algorithms (at the level of CSCI 570), introductory statistics and hypothesis testing (at the level of MATH 308), and machine learning (at the level of CSCI 567) is recommended.

Syllabus

Week	Date	Lecture	Presentation	Items Due	Lecture Material
1	Aug 25	Course logistics Why privacy Attempts at privacy Linkage attacks Differential privacy			week 1 slides week 1 annotated slides Linkage attack practical (ungraded)
	Sep 1	Labor Day		HW1 HW1b-practical
2	Sep 8	Hypothesis testing Laplace mechanism Properties of DP Gaussian mechanism			week 2 slides week 2 annotated slides Sep 8 recording
3	Sep 15	Approximate DP Advanced composition Gradient descent (GD) DP-GD SGD DP-SGD f-DP		HW 1 due HW2 HW2 b (pratical)	week 3 slides week 3 annotated slides Sep 15 recording part 1 Sep 15 recording part 2
4	Sep 22	f-DP Gaussian DP Privacy auditing		HW 2 due on Sep 28th	week 4 slides week 4 annotated slides Sep 22 recording
5	Sep 29	Membership inference attacks Privacy auditing			week 5 slides week 5 annotated slides Sep 29 recording
6	Oct 6	Unlearning Local DP Federated Learning Project brainstorming		HW 3 cancelled Project topic finalized	week 6 slides week 6 annotated slides Oct 6 recording
7	Oct 13	Data attribution	Reconstruction attacks LIRA membership inference attacks		To be uploaded
8	Oct 20	Unlearning	Measuring memorization		To be uploaded
9	Oct 27		Data attribution and watermarking		To be uploaded
10	Nov 3		Unlearning		To be uploaded
11	Nov 10	Local DP Decentralized privacy Federated learning	Privacy in LLMs		To be uploaded
12	Nov 17		Sanitization approaches Prompt defenses Contextual integrity		To be uploaded
13	Nov 24	Local DP Decentralized privacy			To be uploaded
14	Dec 1		Federated privacy & law		To be uploaded
	Dec 8	Study break
	Dec 15			Project report due

Resources

There are no required textbooks. The following writeups are excellent supplemental readings and may be used as references.

C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 2014. pdf. Reference for DP.
Nissim et al. Differential Privacy: A Primer for a Non-technical Audience. Journal of Entertainment & Technology Law, 2018. pdf. Great read with many examples tying legal definitions and privacy in practice.
Kairouz et al. Advances and Open Problems in Federated Learning. Community survey on federated learning. pdf.

This course builds on several related courses which can serve as valuable additional references:

Privacy-Preserving Machine Learning by Aurelien Bellet at Inria (link)
Trustworthy Machine Learning by Reza Shokri at NUS (link)
Federated and Collaborative Learning by Virginia Smith at CMU (link)
Large Scale Optimization for Machine Learning (ISE 633) by Meisam Razaviyayn at USC (link)
Digital Privacy by Vitaly Shmatikov at Cornell (link)