Course Description and Objectives
This course focuses on the foundations of privacy-preserving machine learning. Extremely personal data is being collected at an unprecedented scale by ML companies. While training ML models on such confidential data can be highly beneficial, it also comes with huge privacy risks. This course addresses the dual challenge of maximizing the utility of machine learning models while protecting individual privacy. We will cover the following topics: differential privacy; private training of ML models; privacy attacks and audits; federated and decentralized machine learning.
This course will prepare you to rigorously identify, reason about, and manage privacy risks in machine learning. You will learn to design algorithms that protect sensitive information, and to analyze the privacy leakage of any ML system. Additionally, the course will introduce you to cutting-edge research and practical applications. By the end of the course, you will be well-equipped to undertake research and address real-world privacy challenges in machine learning.
For providing anonymous feedback at any point in the course, please use this anonymous form.
Prerequisites
While there are no official prerequisites, knowledge of advanced probability (at the level of MATH 505a), linear algebra and multi-variable calculus (at the level of MATH 225), analysis of algorithms (at the level of CSCI 570), introductory statistics and hypothesis testing (at the level of MATH 308), and machine learning (at the level of CSCI 567) is recommended.
Syllabus
Grading
- 3 assignments worth 30% of the grade. Collaboration is allowed but must be stated. Grades are based on correctness. The theory part should be written in Latex and the coding part in Jupyter Python notebooks.
- Course Presentation and Project (55% of the grade):
- Presentations (25%): Students will be assigned a paper based on their interest and will present it in class for 30 minutes.
- Project (30%): Students will write a 4-page report on 1-2 papers, which could either be on the paper they presented, supplemented by related readings, or on a different paper(s) of their choice. Pursuing a personal research topic is strongly encouraged.
- Discussions and participation will count for 15%. This will involve reviewing, commenting, and discussing each other's presentations and projects using the role-playing reading group format.
Resources
There are no required textbooks. The following writeups are excellent supplemental readings and may be used as references.
- C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 2014. pdf. Reference for DP.
- Nissim et al. Differential Privacy: A Primer for a Non-technical Audience. Journal of Entertainment & Technology Law, 2018. pdf. Great read with many examples tying legal definitions and privacy in practice.
- Kairouz et al. Advances and Open Problems in Federated Learning. Community survey on federated learning. pdf.
This course builds on several related courses which can serve as valuable additional references:
- Privacy-Preserving Machine Learning by Aurelien Bellet at Inria (link)
- Trustworthy Machine Learning by Reza Shokri at NUS (link)
- Federated and Collaborative Learning by Virginia Smith at CMU (link)
- Large Scale Optimization for Machine Learning (ISE 633) by Meisam Razaviyayn at USC (link)
- Digital Privacy by Vitaly Shmatikov at Cornell (link)