Location:
The members of
Examiners:
Readers:
Everyone is invited to attend
Abstract follows below:
Deciphering the regulatory code of gene expression is a critical challenge in human genetics, instrumental to unlocking the potential of personalized medicine. Modern experimental technologies have resulted in an abundance of high-dimensional genome-wide data, revealing the complex system of epigenetic interactions encoded in the genome. The development of computational approaches which can leverage this vast data to model chromatin interactions globally offer a new understanding of how genomic sequences specify regulatory functions. Specifically, sequence-based deep learning models have become the de facto standard for learning the functional properties encoded in DNA sequences based on large sequencing datasets. These models are powerful tools for interpreting molecular and phenotypic effects, capable of predicting the impact of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterizing their consequences beyond what is tractable from experiments and quantitative genetics alone.
In this thesis, we present two deep learning-based sequence models, which predict different epigenetic properties of the genome that contribute to transcriptional regulation. First, Sei is a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequences and variants based on diverse regulatory activities, such as cell type-specific enhancers.
Next, we developed a model Hedgehog, which enables the quantification of variation on methylation sites. Hedgehog predicts 296 continuous-valued methylation profiles across a range of cell types and tissues. Hedgehog is complementary to Sei and reveals new insights into the relationship between DNA methylation and other epigenetic modifications.
Finally, we show how deep learning-based methods can be applied to elucidate the regulatory basis of human health and disease. Specifically, we use Sei to study the contribution of noncoding mutations in cancer. Collectively, we demonstrate novel frameworks for modeling the sequence dependencies of the epigenome and the capability of such approaches to delineate the regulatory mechanisms underlying complex diseases.