Zixuan Cang




Tuesday, January 12, 2021 - 2:00pm to 3:00pm




Topological and geometric data analysis (TGDA) is a powerful framework for quantitative description and simplification of datasets & shapes. It is especially suitable for modern biological data that are intrinsically complex and high-dimensional. Traditional topological data analysis considers the geometric features of a dataset, while in practice, there could be both geometric and non-geometric features. In this talk, I will introduce a persistent cohomology based method, enriched barcode to embed the non-geometric features in the topological invariants. I will then talk about a geometric method, unnormalized optimal transport for integrating heterogeneous datasets which is crucial for generating a comprehensive topological perspective for the system of interest. Scientific data often have limited size and high complexity, and a straightforward application of machine learning to raw data could result in suboptimal performances. To tackle this challenge, we integrate the TGDA method designed for biological data with deep learning. This topology-based deep learning strategy achieves top performance on standard benchmarks and D3R Grand Challenges, a worldwide competition series in computer-aided drug design. I will also show several applications of our geometric method to the analysis and integration of single-cell omics data. Finally, I will discuss future directions on data-driven modeling using topology, geometry, and machine learning-based approaches.