Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Speaker: 

Ludovic Stephan

Institution: 

EPFL

Time: 

Wednesday, November 9, 2022 - 2:00pm to 3:00pm

Host: 

Location: 

510R Rowland Hall

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular, investigate the connection between the so-called mean field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates. https://arxiv.org/abs/2202.00293

Large deviations for projections of high-dimensional measures

Speaker: 

Yin-Ting Liao

Institution: 

UC Irvine

Time: 

Wednesday, September 28, 2022 - 2:00pm to 3:00pm

Host: 

Location: 

510R Rowland Hall

Random projections of high-dimensional probability measures have gained much attention in asymptotic convex geometry and high-dimensional statistics. While fluctuations at the level of the central limit theorem have been classically studied, only more recently has an inquiry into large deviation principles for such projections been initiated. In this talk, I will review existing work and describe our results on large deviations. I will also talk about sharp large deviation estimates to obtain the prefactor apart from the exponential decay in the spirit of Bahadur and Ranga-Rao. Applications to asymptotic convex geometry and a range of examples including $\ell^p$ balls and Orlicz balls would be given. This talk is based on several joint works with S. S. Kim and K. Ramanan.

Optimal minimization of the covariance loss

Speaker: 

Vishesh Jain

Institution: 

Stanford University

Time: 

Thursday, June 2, 2022 - 2:00pm

Location: 

510R Rowland Hall

Let $X$ be a random vector valued in $\mathbb{R}^m$ such that $\|X\|_{2} \leq 1$ almost surely. In this talk, I will discuss two proofs -- one based on the pinning lemma from statistical physics and another based on randomized rounding -- showing that for every $k \geq 3$, there exists a sigma algebra $\mathcal{F}$ generated by a partition of $\mathbb{R}^{m}$ into $k$ sets such that
    \[\|\operatorname{Cov}(X) - \operatorname{Cov}(\mathbb{E}[X\mid\mathcal{F}])
    \|_{\mathrm{F}} \lesssim \frac{1}{\sqrt{\log{k}}}.\]
This estimate is optimal up to the implicit constant, and improves a previous result of Boedihardjo, Strohmer, and Vershynin, obtained in connection to the design of accurate, privacy-preserving synthetic data, by a factor of $\sqrt{\log\log{k}}$. Joint work with Ashwin Sah (MIT) and Mehtaab Sawhney (MIT).

Spectral asymptotics for contracted tensor ensembles

Speaker: 

Jorge Garza-Vargas

Institution: 

UC Berkeley

Time: 

Wednesday, April 13, 2022 - 2:00pm to 3:00pm

Location: 

510R Rowland Hall

Let $T_{d, N}$ be a random symmetric Wigner-type tensor of dimension $N$ and order $d$. For unit vectors $u_N^{(1)}, \dots, u_{N}^{(d-2)}$ we study the random matrix obtained by taking the contracted tensor $\frac{1}{N} T_{d, n} \left[u_N^{(1)}\otimes \cdots \otimes u_N^{(d-2)} \right]$ and show that, for large $N$, its spectral empirical distribution concentrates around a semicircular distribution whose radius is an explicit symmetric function of the $u_i^N$. We further generalize this result by then considering a family of contractions of $T_{d, N}$ and show, using free probability concepts, that its joint distribution is well-approximated by a non-commutative semicircular family when $N$ is large. This is joint work with Benson Au (https://arxiv.org/abs/2110.01652).

High-dimensional Asymptotics of Feature Learning After One Step of Gradient Descent

Speaker: 

Zhichao Wang

Institution: 

UCSD

Time: 

Wednesday, March 30, 2022 - 2:00pm to 3:00pm

Host: 

Location: 

510R Rowland Hall

In this talk, I will discuss the spectral properties of a two-layer neural network after one gradient step and their applications in random ridge regression. We consider the first gradient step in a two-layer randomly initialized neural network with the empirical MSE loss. In the proportional asymptotic limit, where all the dimensions go to infinity at the same rate, and an idealized student-teacher setting, we will show that the first gradient update contains a rank-1 ''spike'', which results in an alignment between the first-layer weights and the linear component of the teacher model. By verifying a Gaussian equivalent property, we can compute the prediction risk of ridge regression on the conjugate kernel after one gradient step. We will present two scalings of the first step learning rate. For a small learning rate, we compute the asymptotic risks for the ridge regression estimator on top of trained features which does not outperform the best linear model. Whereas for a sufficiently large learning rate, we prove that the ridge estimator on the trained features can go beyond this ``linear'' regime. Our analysis demonstrates that even one gradient step can lead to an advantage over the initial features. Our theoretical results are mainly based on random matrix theory and operator-valued free probability theory, which will be summarized in this talk. This is recent joint work with Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Denny Wu, and Greg Yang.

Adjusted chi-square test for degree-corrected block models

Speaker: 

Arash A. Amini

Institution: 

UCLA

Time: 

Wednesday, May 4, 2022 - 2:00pm to 3:00pm

Host: 

Location: 

510R Rowland Hall

We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, corresponding to the degree of node $i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity. 

When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the rows based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family. 

The test can also be easily applied to Poisson count arrays in clustering or biclustering applications, as well as bipartite and directed networks. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, a collection of one hundred social networks, we find that a DCSBM with a small number of communities (say $ < 25$) is far from a good fit in almost all cases. Despite the lack of fit, we show that the statistic itself can be used as an effective tool for exploring community structure, allowing us to construct a community profile for each network.

https://arxiv.org/abs/2012.15047

Mathematics of synthetic data. III. Superregular random walks and private measures.

Speaker: 

Roman Vershynin

Institution: 

UCI

Time: 

Wednesday, June 1, 2022 - 2:00pm to 3:00pm

Location: 

510R Rowland Hall

In this last talk of the series, we construct a superregular random walk. This will be done by modifying a standard construction of the Brownian motion. Then we will use it to create private synthetic data on the interval. Using sspace-filling curves will allow to extend the construction to higher dimensions. Joint work with March Boedihardjo and Thomas Strohmer, https://arxiv.org/abs/2204.09167

Mathematics of synthetic data. II. Random walks.

Speaker: 

Roman Vershynin

Institution: 

UCI

Time: 

Wednesday, May 25, 2022 - 2:00pm to 3:00pm

Location: 

510R Rowland Hall

In this talk we will reduce the problem of constructing private synthetic data to a construction of a superregular random walk. Such walk locally looks like a simple random walk, but which globally deviates from the origin much slower than the Brownian motion. Joint work with March Boedihardjo and Thomas Strohmer, https://arxiv.org/abs/2204.09167

Mathematics of synthetic data. I. Differential privacy.

Speaker: 

Roman Vershynin

Institution: 

UCI

Time: 

Wednesday, May 18, 2022 - 2:00pm to 3:00pm

Location: 

510R Rowland Hall

This is a series of talks on synthetic data and its privacy. It is meant for beginners. In the first talk we will introduce the notion of differential privacy, construct a private mean estimator, and try (unsuccessfully) to construct a private measure. 

Quantum Unique Ergodicity for generalized Wigner matrices

Speaker: 

Lucas Benigni

Institution: 

University of Chicago

Time: 

Wednesday, May 11, 2022 - 2:00pm to 3:00pm

Host: 

Location: 

510R Rowland Hall

We prove a strong form of delocalization of eigenvectors for general large random matrices called Quantum Unique Ergodicity. These estimates state that the mass of an eigenvector over a subset of entries tends to the uniform distribution. We are also able to prove that the fluctuations around the uniform distribution are Gaussian for a regime of a subset of entries. The proof relies on new eigenvector observables studied dynamically through the Dyson Brownian motion combined with a novel bootstrap comparison argument.

Pages

Subscribe to RSS - Combinatorics and Probability