Many problems of contemporary interest in signal processing and machine learning involve highly non-convex optimization problems. While nonconvex problems are known to be intractable in general, simple local search heuristics such as (stochastic) gradient descent are often surprisingly effective at finding global optima on real or randomly generated data. In this talk I will discuss some results explaining the success of these heuristics by connecting convergence of nonconvex optimization algorithms to supremum of certain stochastic processes. I will focus on two problems.
The first problem, concerns the recovery of a structured signal from under-sampled random quadratic measurements. I will show that projected gradient descent on a natural nonconvex formulation finds globally optimal solutions with a near minimal number of samples, breaking through local sample complexity barriers that have emerged in recent literature. I will also discuss how these new mathematical developments pave the way for a new generation of data-driven phaseless imaging systems that can utilize prior information to significantly reduce acquisition time and enhance image reconstruction, enabling nano-scale imaging at unprecedented speeds and resolutions. The second problem is about learning the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). I will discuss this problem in the high-dimensional regime where the number of observations are fewer than the ReLU weights. I will show that projected gradient descent on a natural least-squares objective, when initialization at 0, converges at a linear rate to globally optimal weights with a number of samples that is optimal up to numerical constants.