On the quantity and quality of single nucleotide polymorphisms in the
human genome
Rick Durrett and Vlada Limic
Single nucleotide polymorphisms with (SNPs) are single nucletoides (A's, T's, G's, C's
that make up the genome) that are polymorphic, i.e., the most common allele has
frequency less than 99%. They are useful markers for locating genes since they occur
throughout the human genome and thousands can be scored at once using DNA microarrays.
Here we use branching processes and coalescent theory to show that if one uses Kruglyak's
(1999) model of the growth of the human population and one assumes an average mutation
rate of 1 x 10^{-8} per nucleotide per generation then there are about 2.8 million SNPs
in the human genome or one very 529 base paris. We also obtain results for the number
of SNPs that will be found in samples. When n = 5, which roughly corresponds to Celera's
sequencing the human genome, an average of 3.1 million nucleotides will be variable in
the sample. However, only about 70% of these cases or about 2.3 million will be
polymorphic. This is very close to the 2.4 million Celera claims to have found.