Classical continuous optimization starts with linear and nonlinear programming. In the past two decades, convex optimization (e.g., sparse linear regressions with convex regularizers) has been a very effective computational tool in applications to statistical estimations and machine learning. However, many modern data-science problems involve some basic ``non’’-properties that are ignored by the convex approach for the sake of the computation convenience. These non-properties include the coupling of the non-convexity, non-differentiability and non-(Clarke) regularity. In this talk, we present a rigorous computational treatment for solving two non-problems: the piecewise affine regression and the feed-forward deep neural network. The algorithmic framework is an integration of the first order non-convex majorization-minimization method and the second order non-smooth Newton methods. Numerical experiments demonstrate the effectiveness of our proposed approach. Contrary to existing methods for solving non-problems which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous mathematical treatment aims to understand properties of these computed solutions with reference to both the empirical and the population risk minimizations. This is based on joint work with Jong-Shi Pang, Bodhisattva Sen and Ziyu He.