11-22
Leveraging Allelic Heterogeneity in Association Studies

Over 100,000 variants have been implicated in human traits through genome-wide association studies (GWAS).  Virtually all reported GWAS identify associations by measuring the correlation between a single variant and a phenotype of interest. Recently, several studies reported that at many risk loci, there may exist multiple causal variants, a phenomenon referred to as allelic heterogeneity. Naturally, for a locus with multiple causal variants with small effect sizes, the standard association test is underpowered to detect the associations. Alternatively, an approach considering effects of multiple variants simultaneously may increase statistical power. Counterintuitively, most approaches that consider multiple variants in association studies find fewer associations than the single SNP association test.  This is due to the fact that most multiple variant methods assume a structure of allelic heterogeneity which is very different from what is observed in genetic studies.  In this work, we propose a new statistical method, Model-based Association test Reflecting causal Status (MARS), that tries to find an association between variants in risk loci and a phenotype, considering multiple variants at each locus. One of the main advantages of MARS is that it only requires the existing summary statistics to detect associated risk loci. Thus, MARS is applicable to any association study with summary statistics, even though individual level data is not available for the study. Utilizing extensive simulated data sets, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while robustly controls the type I error. Applied to data of 44 tissues provided by the Genotype- Tissue Expression (GTEx) consortium, we show that MARS identifies more eGenes compared to previous approaches in most of the tissues; e.g. MARS identified 16% more eGenes than the ones reported by the GTEx consortium. Moreover, applied to Northern Finland Birth Cohort (NFBC) data, we demonstrate that MARS effectively identifies association loci with improved power (56% of more loci found by MARS) in GWAS studies compared to the standard association test.

Bio: Dr. Eleazar Eskin serves as the inaugural chair for the UCLA Department of Computational Medicine. Fascinated by the intersection of computer science and biology, Dr. Eskin is researching and developing computational methods for the analysis of genetic variations in human disease. There are millions of variants in the genome and identifying the variants involved in disease often requires tens of thousands of patient samples. In order to analyze these tremendously large datasets, Dr. Eskin and his team are solving challenging computational problems and developing new computational techniques. He received his PhD in computer science from Columbia University. A recipient of the Alfred P. Sloan Foundation Research Fellowship, Dr. Eskin’s work is supported by the National Science Foundation and the National Institutes of Health.

Lunch for talk attendees will be available at 12:00pm. 
To request accommodations for a disability, please contact Emily Lawrence, emilyl@cs.princeton.edu, 609-258-4624 at least one week prior to the event.

Date and Time
Friday November 22, 2019 12:30pm - 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Host
Ben Raphael

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List