Mathematics of synthetic data and privacy

Combinatorics and Probability

Speaker:

Roman Vershynin

Speaker Link:

https://www.math.uci.edu/~rvershyn/index.html

Institution:

UCI

Time:

Wednesday, December 1, 2021 - 2:00pm to 3:00pm

Location:

Rowland Hall 510R

An emerging way to protect privacy is to replace true data by synthetic data. Medical records of artificial patients, for example, could retain meaningful statistical information while preserving privacy of the true patients. But what is synthetic data, and what is privacy? How do we define these concepts mathematically? Is it possible to make synthetic data that is both useful and private? I will tie these questions to a simple-looking problem in probability theory: how much information about a random vector X is lost when we take conditional expectation of X with respect to some sigma-algebra? This talk is based on a series of papers joint with March Boedihardjo and Thomas Strohmer, mainly this one: https://arxiv.org/abs/2107.05824