Mathematics of synthetic data and privacy

Speaker: 

Roman Vershynin

Institution: 

UCI

Time: 

Wednesday, December 1, 2021 - 2:00pm to 3:00pm

Location: 

Rowland Hall 510R

An emerging way to protect privacy is to replace true data by synthetic data. Medical records of artificial patients, for example, could retain meaningful statistical information while preserving privacy of the true patients. But what is synthetic data, and what is privacy? How do we define these concepts mathematically? Is it possible to make synthetic data that is both useful and private? I will tie these questions to a simple-looking problem in probability theory: how much information about a random vector X is lost when we take conditional expectation of X with respect to some sigma-algebra? This talk is based on a series of papers joint with March Boedihardjo and Thomas Strohmer, mainly this one: https://arxiv.org/abs/2107.05824

An invariance principle for Markov cookie random walks.

Speaker: 

Thomas Mountford

Institution: 

EPFL

Time: 

Wednesday, November 17, 2021 - 2:00pm to 3:00pm

Host: 

Location: 

510R

 

 

In joint work with E Kosygina and J Peterson, the

"natural" diffusive scaling is considered for the recurrent case

and the convergence to Brownian motion perturbed at extrema is shown.  The key ideas are coarse graining and

the Ray Knight approach.

Properties of the Riemann zeta distribution.

Speaker: 

Michael Cranston

Institution: 

UCI

Time: 

Wednesday, November 10, 2021 - 2:00pm to 3:00pm

Location: 

510R

In this talk we will discuss properties of integers selected according to the Riemann zeta distribution. We will emphasize two aspects of this distribution. The first is its faithful similarity to properties of an integer chosen according to the uniform distribution on a finite interval. The second aspect will be the appearance of Poisson behavior under this distribution. The Riemann zeta function is given for $\mbox{Re} z>1$ by 

$$\zeta(z)=\sum_{n=1}^\infty \frac{1}{n^z}.$$

An alternative description is given by

$$\zeta(z)=\Pi_{p\in\mathcal{P}}\lt(1-\frac{1}{p^z}\rt)^{-1},$$

where $\mathcal{P}$ denotes the set of primes.

In our discussions we will replace the complex $z$ by a real number $s>1.$ We will denote by $X_s$ a random variable with the distribution

P(X_s=n)=\frac{1}{\zeta(s)n^s},\, n=1,2,3,\cdots.

The statistical properties of $X_s$ is the focus of the talk.

The talk is based on joint work with Adrien Peltzer.

Clustering a mixture of Gaussians with unknown covariance

Speaker: 

Mateo Diaz

Institution: 

Caltech

Time: 

Wednesday, November 3, 2021 - 2:00pm to 3:00pm

Host: 

Location: 

510R Rowland Hall

Clustering is a fundamental data scientific task with broad application. This talk investigates a simple clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We show its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we present numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a k-means program that handles multi-component mixtures with possibly unequal weights and has similar guarantees.

Learning low degree functions in logarithmic number of random queries

Speaker: 

Paata Ivanishvili

Institution: 

UCI

Time: 

Wednesday, October 27, 2021 - 2:00pm to 3:00pm

Location: 

Rowland Hall 510R

Perhaps a very basic question one asks in learning theory is as follows: we are given a  function f on the hypercube {-1,1}^n, and we are allowed to query samples (X, f(X)) where X is uniformly distributed on {-1,1}^n. After getting these samples (X_1, f(X_1)), ..., (X_N, f(X_N)) we would like to construct a function h which approximates f up to an error epsilon (say in L^2). Of course h is a random function as it involves i.i.d. random variables X_1, ... , X_N in its construction. Therefore, we want to construct such h which can only fail to approximate f with probability at most delta. So given parameters epsilon, delta  in (0,1) the goal is to minimize the number of random queries N. I will show that around log(n) random queries are sufficient to learn bounded "low-complexity" functions. Based on joint work with Alexandros Eskenazis. 

Odd subgraphs are odd

Speaker: 

Asaf Ferber

Institution: 

UCI

Time: 

Wednesday, October 20, 2021 - 2:00pm to 3:00pm

Location: 

510R

In this talk we discuss the problems of finding large induced subgraphs of a given graph G with some degree-constraints. We survey some classical results, present some intersting and challenging open problems, and sketch solutions to some of them. 

This is based on joint works with Liam Hardiman and Michael Krivelevich. 

Sharp matrix concentration

Speaker: 

March Boedihardjo

Institution: 

UCI

Time: 

Wednesday, October 13, 2021 - 2:00pm to 3:00pm

Location: 

510R

Classical matrix concentration inequalities are sharp up to a logarithmic factor. This logarithmic factor is necessary in the commutative case but unnecessary in many classical noncommutative cases. We will present some matrix concentration results that are sharp in many cases, where we overcome this logarithmic factor by using an easily computable quantity that captures noncommutativity. Joint work with Afonso Bandeira and Ramon van Handel. Paper: https://arxiv.org/abs/2108.06312

Gaussian Spherical Tessellations and Learning Adaptively

Speaker: 

Anna Ma

Institution: 

UCI

Time: 

Wednesday, October 6, 2021 - 2:00pm to 3:00pm

Location: 

510R

Signed measurements of the form $y_i = sign(\langle a_i, x \rangle)$ for $i \in [M]$ are ubiquitous in large-scale machine learning problems where the overarching task is to recover the unknown, unit norm signal $x \in \mathbb{R}^d$. Oftentimes, measurements can be queried adaptively, for example based on a current approximation of $x$, leading to only a subset of the $M$ measurements being needed. Geometrically, these measurements emit a spherical hyperplane tessellation in $\mathbb{R}^{d}$ where one of the cells in the tessellation contains the unknown vector $x$. Motivated by this problem, in this talk we will present a geometric property related to spherical hyperplane tessellations in $\mathbb{R}^{d}$. Under the assumption that $a_i$ are Gaussian random vectors, we will show that with high probability there exists a subset of the hyperplanes whose cardinality is on the order of $d\log(d)\log(M)$ such that the radius of the cell containing $x$ induced by these hyperplanes is bounded above by, up to constants, $d\log(d)\log(M)/M$. The work presented is joint work with Rayan Saab and Eric Lybrand. 

Nodal domains of eigenvectors of Erdos-Renyi random graphs

Speaker: 

Mark Rudelson

Institution: 

University of Michigan

Time: 

Friday, March 6, 2020 - 2:00pm to 2:50pm

Host: 

Location: 

RH 340N

A nodal domain of a function is a connected component of the set where this function has a constant sign. Nodal domains of eignefunctions of the Laplacian are classical objects in geometry of manifolds. Their number of nodal domains corresponding to the k-th smallest eigenvalue is bounded by k and it typically increases as k gets larger. About 10 years ago, Dekel, Lee, and Linial proved that with high probability, the number of nodal domains of Erdos-Renyi graphs remains bounded as the size of the graph and the eigenvalues increase.  This runs contrary to the intuition we draw from the world of manifolds. We will survey some recent results on the structure of nodal domains of such graphs. Based in part on the joint work with Han Huang.

Independent sets in random graphs and random trees

Speaker: 

Steven Heilman

Institution: 

USC

Time: 

Tuesday, March 3, 2020 - 11:00am

Host: 

Location: 

Rowland Hall, 340P

An independent set of size k in a finite undirected graph is a set of k vertices of the graph, no two of which are connected by an edge. The structure of independent sets of size k as k varies is of interest in probability, statistical physics, combinatorics, and computer science. In 1987, Alavi, Malde, Schwenk and Erdos conjectured that the number of independent sets of size k in a tree is a unimodal sequence (this number goes up and then it goes down), and this problem is still open. A variation on this question is: do the number of independent sets of size k form a unimodal sequence for Erdos-Renyi random graphs, or random trees? By adapting an argument of Coja-Oghlan and Efthymiou, we show unimodality for Erdos-Renyi random graphs, random bipartite graphs and random regular graphs (with high probability as the number of vertices in the graph goes to infinity, when the expected degree of a single vertex is large). The case of random trees remains open, as we can only show weak partial results there. 

 

Pages

Subscribe to RSS - Combinatorics and Probability