Strong freezing of the binary perceptron model

Combinatorics and Probability

Speaker:

Shuangping Li

Speaker Link:

https://fifalsp.github.io/

Institution:

Stanford University

Time:

Wednesday, October 26, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

We consider the binary perceptron model, a simple model of neural networks that has gathered significant attention in the statistical physics, information theory and probability theory communities. We show that at low constraint density (m=n^{1-epsilon}), the model exhibits a strong freezing phenomenon with high probability, i.e. most solutions are isolated. We prove it by a refined analysis of the log partition function. Our proof technique relies on a second moment method and cluster expansions. This is based on joint work with Allan Sly.

Detecting Hidden Communities by Power Iterations with Connections to Vanilla Spectral Algorithms

Combinatorics and Probability

Speaker:

Jiapeng Zhang

Speaker Link:

https://sites.google.com/site/jiapeng0708/home

Institution:

USC

Time:

Wednesday, November 30, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

Community detection in the stochastic block model is one of the central problems of graph clustering. In this setup, spectral algorithms have been one of the most widely used frameworks for the design of clustering algorithms. However, despite the long history of study, there are still unsolved challenges. One of the main open problems is the design and analysis of ``simple'' spectral algorithms, especially when the number of communities is large.

In this talk, I will discuss two algorithms. The first one is based on the power-iteration method. Our algorithm performs optimally (up to logarithmic factors) compared to the best known bounds in the dense graph regime by Van Vu (Combinatorics Probability and Computing, 2018).

Then based on a connection between the powered adjacency matrix and eigenvectors, we provide a ``vanilla'' spectral algorithm for large number of communities in the balanced case. Our spectral algorithm is as simple as PCA (principal component analysis).

This talk is based on joint works with Chandra Sekhar Mukherjee. (https://arxiv.org/abs/2211.03939)

Estimation of the covariance matrix in the presence of outliers

Combinatorics and Probability

Speaker:

Nikita Zhivotovskiy

Speaker Link:

https://sites.google.com/view/nikitazhivotovskiy/

Institution:

UC Berkeley

Time:

Wednesday, November 16, 2022 - 2:00pm to 3:00pm

Host:

Roman Vershynin

Location:

510R Rowland Hall

Suppose we are observing a sample of independent random vectors, knowing that the original distribution was contaminated, so that a fraction of observations came from a different distribution. How to estimate the covariance matrix of the original distribution in this case? In this talk, we discuss an estimator of the covariance matrix that achieves the optimal dimension-free rate of convergence under two standard notions of data contamination: We allow the adversary to corrupt a fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies a certain (rather weak) moment equivalence assumption. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. Based on a joint work with Pedro Abdalla.

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Combinatorics and Probability

Speaker:

Ludovic Stephan

Speaker Link:

https://www.lstephan.fr/

Institution:

EPFL

Time:

Wednesday, November 9, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular, investigate the connection between the so-called mean field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates. https://arxiv.org/abs/2202.00293

Large deviations for projections of high-dimensional measures

Combinatorics and Probability

Speaker:

Yin-Ting Liao

Speaker Link:

https://sites.google.com/brown.edu/ytliao/home

Institution:

UC Irvine

Time:

Wednesday, September 28, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

Random projections of high-dimensional probability measures have gained much attention in asymptotic convex geometry and high-dimensional statistics. While fluctuations at the level of the central limit theorem have been classically studied, only more recently has an inquiry into large deviation principles for such projections been initiated. In this talk, I will review existing work and describe our results on large deviations. I will also talk about sharp large deviation estimates to obtain the prefactor apart from the exponential decay in the spirit of Bahadur and Ranga-Rao. Applications to asymptotic convex geometry and a range of examples including $\ell^p$ balls and Orlicz balls would be given. This talk is based on several joint works with S. S. Kim and K. Ramanan.

Optimal minimization of the covariance loss

Combinatorics and Probability

Speaker:

Vishesh Jain

Speaker Link:

https://jainvishesh.github.io/

Institution:

Stanford University

Time:

Thursday, June 2, 2022 - 2:00pm

Location:

510R Rowland Hall

Let $X$ be a random vector valued in $\mathbb{R}^m$ such that $\|X\|_{2} \leq 1$ almost surely. In this talk, I will discuss two proofs -- one based on the pinning lemma from statistical physics and another based on randomized rounding -- showing that for every $k \geq 3$, there exists a sigma algebra $\mathcal{F}$ generated by a partition of $\mathbb{R}^{m}$ into $k$ sets such that
\[\|\operatorname{Cov}(X) - \operatorname{Cov}(\mathbb{E}[X\mid\mathcal{F}])
\|_{\mathrm{F}} \lesssim \frac{1}{\sqrt{\log{k}}}.\]
This estimate is optimal up to the implicit constant, and improves a previous result of Boedihardjo, Strohmer, and Vershynin, obtained in connection to the design of accurate, privacy-preserving synthetic data, by a factor of $\sqrt{\log\log{k}}$. Joint work with Ashwin Sah (MIT) and Mehtaab Sawhney (MIT).

Spectral asymptotics for contracted tensor ensembles

Combinatorics and Probability

Speaker:

Jorge Garza-Vargas

Speaker Link:

https://math.berkeley.edu/~jgarzav/

Institution:

UC Berkeley

Time:

Wednesday, April 13, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

Let $T_{d, N}$ be a random symmetric Wigner-type tensor of dimension $N$ and order $d$. For unit vectors $u_N^{(1)}, \dots, u_{N}^{(d-2)}$ we study the random matrix obtained by taking the contracted tensor $\frac{1}{N} T_{d, n} \left[u_N^{(1)}\otimes \cdots \otimes u_N^{(d-2)} \right]$ and show that, for large $N$, its spectral empirical distribution concentrates around a semicircular distribution whose radius is an explicit symmetric function of the $u_i^N$. We further generalize this result by then considering a family of contractions of $T_{d, N}$ and show, using free probability concepts, that its joint distribution is well-approximated by a non-commutative semicircular family when $N$ is large. This is joint work with Benson Au (https://arxiv.org/abs/2110.01652).

High-dimensional Asymptotics of Feature Learning After One Step of Gradient Descent

Combinatorics and Probability

Speaker:

Zhichao Wang

Institution:

UCSD

Time:

Wednesday, March 30, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

In this talk, I will discuss the spectral properties of a two-layer neural network after one gradient step and their applications in random ridge regression. We consider the first gradient step in a two-layer randomly initialized neural network with the empirical MSE loss. In the proportional asymptotic limit, where all the dimensions go to infinity at the same rate, and an idealized student-teacher setting, we will show that the first gradient update contains a rank-1 ''spike'', which results in an alignment between the first-layer weights and the linear component of the teacher model. By verifying a Gaussian equivalent property, we can compute the prediction risk of ridge regression on the conjugate kernel after one gradient step. We will present two scalings of the first step learning rate. For a small learning rate, we compute the asymptotic risks for the ridge regression estimator on top of trained features which does not outperform the best linear model. Whereas for a sufficiently large learning rate, we prove that the ridge estimator on the trained features can go beyond this ``linear'' regime. Our analysis demonstrates that even one gradient step can lead to an advantage over the initial features. Our theoretical results are mainly based on random matrix theory and operator-valued free probability theory, which will be summarized in this talk. This is recent joint work with Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Denny Wu, and Greg Yang.

Adjusted chi-square test for degree-corrected block models

Combinatorics and Probability

Speaker:

Arash A. Amini

Speaker Link:

http://www.stat.ucla.edu/~arashamini/

Institution:

UCLA

Time:

Wednesday, May 4, 2022 - 2:00pm to 3:00pm

Host:

Roman Vershynin

Location:

510R Rowland Hall

We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, corresponding to the degree of node $i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity.

When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the rows based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family.

The test can also be easily applied to Poisson count arrays in clustering or biclustering applications, as well as bipartite and directed networks. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, a collection of one hundred social networks, we find that a DCSBM with a small number of communities (say $ < 25$) is far from a good fit in almost all cases. Despite the lack of fit, we show that the statistic itself can be used as an effective tool for exploring community structure, allowing us to construct a community profile for each network.

https://arxiv.org/abs/2012.15047

Mathematics of synthetic data. III. Superregular random walks and private measures.

Combinatorics and Probability

Speaker:

Roman Vershynin

Speaker Link:

https://www.math.uci.edu/~rvershyn/index.html

Institution:

UCI

Time:

Wednesday, June 1, 2022 - 2:00pm to 3:00pm

Location:

510R Rowland Hall

In this last talk of the series, we construct a superregular random walk. This will be done by modifying a standard construction of the Brownian motion. Then we will use it to create private synthetic data on the interval. Using sspace-filling curves will allow to extend the construction to higher dimensions. Joint work with March Boedihardjo and Thomas Strohmer, https://arxiv.org/abs/2204.09167

Combinatorics and Probability

Speaker:

Speaker Link:

Institution:

Time:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Host:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Location:

Speaker:

Institution:

Time:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Host:

Location:

Speaker:

Speaker Link:

Institution:

Time:

Location:

Pages