I will discuss our recent work on graphical model inferential methodology in three areas in bioinformatics: (1) Population structure and recombination hotspot inference, using a novel approach based on Dirichlet process priors. I present a hidden Markov version of the Dirichlet process which allows us to infer recombination events among haplotypes in an “open” ancestral space. (2) Comparative genomics prediction of imperfectly conserved transcription factor binding sites, where multi-resolution phylogenetic inference combines with Markovian inference to provide sensitive detection of motifs and their evolutionary turnovers in eleven Drosophila species. (3) Reverse-engineering of temporally rewiring networks from gene expression time courses, where a novel hidden temporal exponential random graph model is employed to model temporal evolution of network topologies during a biological process, and to facilitate the inference of transient (rather than a single universal) regulatory circuitry underlying each time-point of the microarray time series.
Bio: Eric Xing is an assistant professor in the Machine Learning Department, the Language Technology Institute, and the Computer Science Department within the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for building quantitative models and predictive understandings of the evolutionary mechanism, regulatory circuitry, and developmental processes of biological systems; and for building computational intelligence systems involving automated learning, reasoning, and decision-making in open, evolving possible worlds. Professor Xing received his B.S. in Physics from Tsinghua University, his first Ph.D. in Molecular Biology and Biochemistry from Rutgers University, and then his second Ph.D. in Computer Science from UC Berkeley. He has been a member of the faculty at Carnegie Mellon University since 2004, and his current work involves, 1) graphical models, Bayesian methodologies, inference algorithms, and optimization techniques for analyzing and mining high-dimensional, longitudinal, and relational data; 2) computational and comparative genomic analysis of biological sequences, systems biology investigation of gene regulation, and statistical analysis of genetic variation, demography and disease linkage; and 3) application of statistical learning in text/image mining, vision, and machine translation.