Roman Vershynin | Research

Want to learn more about my research?

My primary area of expertise is high dimensional probability. I am interested in theoretical problems as well as applications of probability to data science.

For a deep dive, check out my textbook on high-dimensional probability. Not ready to read a 300+ book? Here is a much shorter version of it, talking you on a walk through some modern probabilistic methods and their application for data science. This survey explains relationships between high-dimensional probability, high-dimensional geometry and high-dimensional inference, and this tutorial connects these areas to random matrix theory. If you are ready to explore actual research papers, take a peek at my publications.

If you are thinking of becoming my student or postdoc, take a look at the work of my past and present mentees.

Why is this interesting, exactly?

Probability theory explains how global order can emerge from local chaos. Daily fluctuations of stock prices or movements of air molecules may look completely chaotic. But as we zoom out, such random processes start to look more "smooth" and predictable. Classical laws of probability theory -- the law of large numbers and the central limit theorem -- describe the global behavior of such processes.

Probabilists have been traditionally concerned with random processes in low dimensions. Dimension 1 corresponds to random numeric effects (such as stock prices), and dimension 3 corresponds to random processes in our three-dimensional world (for example, movements of particles). Big data, however, has big dimensions. Every pixel of a photograph, every gene, every health parameter of a patient counts as a dimension. Theoretical foundations of modern data science need to be built in a huge, potentially unlimited, number of dimensions. Can probability theory explain randomness in such huge-dimensional worlds? As the dimensionality of the world increases, can the global order still emerge or such a high-dimensional world will be globally chaotic?

Paradoxically, high dimensionality often tends to simplify the picture. Let me give an example. We normally visualize three-dimensional objects by their projections onto a two-dimensional plane. Projecting a three-dimensional cube onto a randomly chosen plane, we almost always get a hexagon. If the dimension of the cube increases to infinity, we expect the shape of the projection to be more and more complicated. Surprisingly, this does not happen! A random projection of a cube approaches a circle, which is a very simple geometric shape. High dimensionality turned out not our enemy but our friend.

I am fascinated by high-dimensional worlds, by random objects living in them, by random actions one can make to "tame" high-dimensional objects. My purely aesthetic passion is fueled by a demand to build a mathematical theory of big data. Perhaps the future generations will view us as something like pre-Newtonian physicists to whom the most fundamental laws of the physical universe had yet been unknown. Just like them, we are trying to discover the fundamental laws of big data, and we bet that high-dimensional probability will give us the key. So we are standing there knocking at the door, mesmerized by the patterns that arise from the still elusive laws of big data in high dimensions.

E-mail me