In the era of big biological data, there is a pressing need for methods that visualize, integrate and interpret high-throughput high-dimensional data to enable biological discovery. There are several major challenges in analyzing high-throughput biological data. These include the curse of (high) dimensionality, noise, sparsity, missing values, bias, and collection artifacts. In my work, I try to solve these problems using computational methods that are based on manifold learning. A manifold is a smoothly varying low-dimensional structure embedded within high-dimensional ambient measurement space. In my talk, I will present a number of my recently completed and ongoing projects that utilize the manifold, implemented using graph signal processing and deep learning, to understand large biomedical datasets. These include MAGIC, a data denoising and imputation method designed to ‘fix’ single-cell RNA-sequencing data, PHATE, a dimensionality reduction and visualization method specifically designed to reveal continuous progression structure, and two deep learning methods that use specially designed constraints to allow for deep interpretable representations of heterogeneous systems. I will demonstrate that these methods can give insight into diverse biological systems such as breast cancer epithelial-to-mesenchymal transition, human embryonic stem cell development, the gut microbiome, and tumor infiltrating lymphocytes.
Lunch for talk attendees will be available at 12:00pm.
To request accommodations for a disability, please contact Sara B. Thibeault at thibeault@princeton.edu, at least one week prior to the event.