Linkage Disequilibrium (LD)

Haplotype and Allele Frequencies

Locus A Locus B Allele 1 Allele 2 Allele frequency
Allele 1 pAB pAb pA
Allele 2 paB pab pa
Allele frequency pB pb 1

Firstly, as $p_a = 1 - p_A$, $p_b = 1 - p_B$, we can calculate $p_a$ or $p_b$, if we know $p_A$ or $p_B$.

Secondly, as $p_{Ab} = p_{A} - p_{AB}$, $p_{aB} = p_{B} - p_{AB}$, $p_{ab} = 1 - p_{A} - p_{B} + p_{AB}$, we can calcuate $p_{Ab}\text{, }p_{aB}\text{ and }p_{AB}$, if we know $p_{AB}$.

As the frequency of all four haplotype need to be greater or equal to 0, so we can know:

\[p_A + p_B - 1 \le p_{AB} \le min[p_A,p_B]\]

Measure D

Define that $D = p_{AB} - p_A \times p_B$, so we can know:

\[-(1-p_A)(1-p_B) \le D \le min[p_A(1-p_B), p_B(1-p_A)]\]

, and

\[p_{AB} = p_A \times p_B\text{, then }D = 0\]

Measure r2

The LD measure $r^2$ is the squared correlation, where r scales D by the standard deviations of the allele frequencies at two loci:

\[r^2 = \frac{D^2}{p_Ap_B(1-p_A)(1-p_B)}\]

As I assume A and B are the major allele for loci 1 and 2, respectively, we can see:

\[(1-p_A)^2(1-p_B)^2 \le p_A^2(1-p_B)^2\] \[(1-p_A)^2(1-p_B)^2\le(1-p_A)^2p_B^2\]

and the range of $r^2$ is

\[0 \le r^2 \le min[\frac{p_A(1-p_B)}{(1-p_A)p_B}, \frac{(1-p_A)p_B}{p_A(1-p_B)}]\]

Measure D’

The D’ scales D by its maximum value given the allele frequencies:

\[D' = \frac{D}{min[p_A(1-p_B), p_B(1-p_A)]}\text{, if D > 0}\] \[D' = \frac{D}{min[p_Ap_B, (1-p_A)(1-p_B)]}\text{, if D < 0}\]

|D’| has range 0-1 regardless of allele frequency

Notes

  • high |D’| and high $r^2$: the presence of only two haplotypes, small difference in allele frequency;
  • high |D’| and low $r^2$: presence of only three haplotypes, different allele frquencies;
  • low |D’| and low $r^2$: random coupling of alleles and presence of all four haplotypes.

Quantification of relationship between allele frequencies and $r^2$

Furthermore, if we assume $p_B = p_A + v$, we can know:

\[2p_A+v-1 \le p_{AB} \le min[p_A,p_A+v]\] \[-(1-p_A)(1-p_A -v) \le D \le min[p_A(1-p_A-v),(p_A+v)(1-p_A)]\] \[0 \le r^2 \le min[\frac{p_A(1-p_A-v)}{(p_A+v)(1-p_A)}, \frac{(p_A+v)(1-p_A)}{p_A(1-p_A-v)}]\]

If we set a threshold for $r^2$, i.e. $r^2 \ge t \ge 0$, then

\[-\frac{p_A(1-p_A)(1-t)}{1-p_A(1-t)} \le v \le \frac{p_A(1-p_A)(1-t)}{p_A(1-t)+t}\] \[\frac{tp_A}{1-p_A(1-t)} \le p_B \le \frac{p_A}{p_A(1-t)+t}\]

Further reading

  1. shiny app: https://huanwei.shinyapps.io/ldfrq/
  2. Naomi R. Wray. Twin Research and Human Genetics. 2005.