HAPLOFREQ - Estimating Haplotype Frequencies Efficiently
Report ID: TR-706-04Author: Hazan, Elad / Halperin, Eran
Date: 2004-06-00
Pages: 19
Download Formats: |PDF|
Abstract:
We present a new method HAPLOFREQ to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data. Our approach incorporates a maximum likelihood model based on a simple random generative model which assumes that the genotypes are independently sampled from the population. We first show that if the phased haplotypes are given, possibly with missing data, we can estimate the frequency of the haplotypes in the population by finding the global optimum of the likelihood function in polynomial time. If the haplotypes are not phased, finding the maximum value of the likelihood function is NP-hard. In this case we define an alternative likelihood function which can be thought of as a relaxed likelihood function. We show  that the maximum relaxed likelihood can be found in polynomial time, and that  the optimal solution of the relaxed likelihood approaches asymptotically to the haplotype frequencies in the data.
In contrast to previous approaches, our algorithms are guaranteed to converge in polynomial time to a global maximum of the different likelihood functions. Preliminary experiments on biological data show that our estimates are about 10% more accurate than the popular program PHASE and about three to ten times faster.
Our techniques involve new algorithms in convex optimization. These algorithms may be of independent interest. Furthermore, the hardness proof involves a generalization of Turan's theorem, which may also be of independent interest.