Nowadays, optimal transport, i.e., Wasserstein metrics, play essential roles in data science. In this talk, we briefly review its development and applications in machine learning. In particular, we will focus its induced optimal control problems in density space and differential structures. We introduce the Wasserstein natural gradient in parametric models.
The Wasserstein metric tensor in probability density space is pulled back to the one on parameter space. We derive the Wasserstein gradient flows and proximal operators in parameter space. We demonstrate that the Wasserstein natural gradient works efficiently in learning, with examples in Boltzmann machines, generative adversary networks (GANs), image classifications, and adversary robustness.