Genetic Relationship Matrix (GRM)
GRM, short for genetic relationship matrix, is a important matrix used in genetic analysis of human complex traits.
This blog is my study note for GRM.
Mixed linear model (MLM)
Based on Yang et al., AJHG, 2011
\[\pmb{y}_{n \times 1}=\pmb{X}_{n \times k}\pmb{\beta}_{k \times 1}+\pmb{W}_{n \times m}\pmb{u}_{m \times 1}+\pmb{\epsilon}_{n \times 1}\]- $\pmb{y}$ is a vector of phenotypes; n: the number of individuals;
- $\pmb{X}$ is a matrix for variables with fixed effects; k: the number of fixed effect factors;
- $\pmb{\beta}$ is a vector of fixed effects;
- $\pmb{u}$ is a vector of random effects; $\pmb{u} \sim N(0, \pmb{I}\sigma_u^2)$, where $\pmb{I}$ is a $n \times n$ identity matrix, and m i the number of SNPs;
- $\pmb{W}$ is a matrix for standardized genotype matrix, where its element for i-th SNP of j-th individual $w_{ij}=(x_{ij}-2p_i)/\sqrt{2p_i(1-p_i)}$ with $x_{ij}$ being the genotype value coded as the number of copies of the reference alleles {0, 1, 2};
If define $\sigma_g^2=m\sigma_u^2$ (the variance explained by all the SNPs), $\pmb{g}=\pmb{W}\pmb{u}$ and $\pmb{A}=\pmb{W}\pmb{W}^T/m$ (genetic relationship matrix (GRM) between individuals),
\[\pmb{y}=\pmb{X}\pmb{\beta}+\pmb{g}+\pmb{\sigma}\text{ and }\pmb{V}=\pmb{A}\sigma_g^2+\pmb{I}\sigma_{\epsilon}^2\]Practice
Simulate the genotype matrix
I use the R language to simulate a genotype matrix for n
individuals and m
SNPs
R code
n=5; m=3
set.seed(10)
p = runif(m, min=0.2, max=0.5) ### allele frequency were draw from a uniform distribution
x_A1 = t(replicate(n, rbinom(m, 1, p))) ### for the plink ped file
x_A2 = t(replicate(n, rbinom(m, 1, p)))
x = x_A1 + x_A2
colnames(x) = paste("SNP",1:m,sep="")
rownames(x) = paste("indi", 1:n, sep="")
x
#### now we save the matrix x, convert to plink bfile, and then calculate GRM using GCTA
x_A1 = ifelse(x_A1==0, "a", "A")
x_A2 = ifelse(x_A2==0, "a", "A")
dir.create("toy")
write.table(cbind(data.frame(FID=rownames(x), IID=rownames(x), 0, 0, 0, 0), cbind(x_A1, x_A2)[,order(c(1:m, 1:m+0.5))]), "toy/toy.ped", quote=F, sep="\t", row.names=F, col.names=F)
write.table(data.frame(CHR=1, SNP=colnames(x), cM=0, BP=10000+1:ncol(x)), "toy/toy.map", quote=F, sep="\t", row.names=F, col.names=F)
Output
## SNP1 SNP2 SNP3
## indi1 1 1 1
## indi2 0 1 0
## indi3 0 0 1
## indi4 0 1 0
## indi5 0 0 0
After standardization
R code
p_hat = apply(x, 2, sum)/(2*n)
w = apply(rbind(x,p_hat), 2, function(x) (x-2*x[length(x)])/sqrt(2*x[length(x)]*(1-x[length(x)])))[1:n,]
w
Output
## SNP1 SNP2 SNP3
## indi1 1.8856181 0.6172134 1.0606602
## indi2 -0.4714045 0.6172134 -0.7071068
## indi3 -0.4714045 -0.9258201 1.0606602
## indi4 -0.4714045 0.6172134 -0.7071068
## indi5 -0.4714045 -0.9258201 -0.7071068
Calculate the GRM
The diagonal elements in GRM
R code
A = w %*% t(w) /m
A
Output
## indi1 indi2 indi3 indi4 indi5
## indi1 1.6871693 -0.41931217 -0.1117725 -0.41931217 -0.73677249
## indi2 -0.4193122 0.36772487 -0.3664021 0.36772487 0.05026455
## indi3 -0.1117725 -0.36640212 0.7347884 -0.36640212 0.10978836
## indi4 -0.4193122 0.36772487 -0.3664021 0.36772487 0.05026455
## indi5 -0.7367725 0.05026455 0.1097884 0.05026455 0.52645503
R code
diag(A)
Output
## indi1 indi2 indi3 indi4 indi5
## 1.6871693 0.3677249 0.7347884 0.3677249 0.5264550
The off-diagonal elelments in GRM
R code
A[row(A)<col(A)]
Output
## [1] -0.41931217 -0.11177249 -0.36640212 -0.41931217 0.36772487 -0.36640212
## [7] -0.73677249 0.05026455 0.10978836 0.05026455
R code
A[row(A)>col(A)]
Output
## [1] -0.41931217 -0.11177249 -0.41931217 -0.73677249 -0.36640212 0.36772487
## [7] 0.05026455 -0.36640212 0.10978836 0.05026455
Run the “make-grm” using GCTA
Save the genotype matrix in Plink bfile formate
bash code
~/Software/PLINK2/plink --ped toy/toy.ped --map toy/toy.map --make-bed --out toy/toy ### generate PLINK bfiles: .bim, .fam, .bed
log message
## PLINK v1.90b4.4 64-bit (21 May 2017) www.cog-genomics.org/plink/1.9/
## (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3
## Logging to toy/toy.log.
## Options in effect:
## --make-bed
## --map toy/toy.map
## --out toy/toy
## --ped toy/toy.ped
##
## 16384 MB RAM detected; reserving 8192 MB for main workspace.
## Scanning .ped file... 0%20%40%60%80%100%
.ped scan complete (for binary autoconversion).
## Performing single-pass .bed write (3 variants, 5 people).
## 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%
--file: toy/toy-temporary.bed + toy/toy-temporary.bim + toy/toy-temporary.fam
## written.
## 3 variants loaded from .bim file.
## 5 people (0 males, 0 females, 5 ambiguous) loaded from .fam.
## Ambiguous sex IDs written to toy/toy.nosex .
## Using 1 thread (no multithreaded calculations invoked).
## Before main variant filters, 5 founders and 0 nonfounders present.
## Calculating allele frequencies... 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%95%96%97%98%99% done.
## 3 variants and 5 people pass filters and QC.
## Note: No phenotypes present.
## --make-bed to toy/toy.bed + toy/toy.bim + toy/toy.fam ... 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%95%96%97%98%99%done.
bash code
~/Software/gcta_1.93.2beta_mac/gcta64 --bfile toy/toy --make-grm --out toy/toy.grm ### generate grm using GCTA
log message
## *******************************************************************
## * Genome-wide Complex Trait Analysis (GCTA)
## * version 1.93.2 beta Mac
## * (C) 2010-present, Jian Yang, The University of Queensland
## * Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
## *******************************************************************
## Analysis started at 16:41:12 AEST on Tue Aug 03 2021.
## Hostname: imb20-015312-lt
##
## Options:
##
## --bfile toy/toy
## --make-grm
## --out toy/toy.grm
##
## Note: GRM is computed using the SNPs on the autosome.
## Reading PLINK FAM file from [toy/toy.fam]...
## 5 individuals to be included from FAM file.
## 5 individuals to be included. 0 males, 0 females, 5 unknown.
## Reading PLINK BIM file from [toy/toy.bim]...
## 3 SNPs to be included from BIM file(s).
## Computing the genetic relationship matrix (GRM) v2 ...
## Subset 1/1, no. subject 1-5
## 5 samples, 3 markers, 15 GRM elements
## IDs for the GRM file has been saved in the file [toy/toy.grm.grm.id]
## Computing GRM...
## 100% finished in 0.0 sec
## 3 SNPs have been processed.
## Used 3 valid SNPs.
## The GRM computation is completed.
## Saving GRM...
## GRM has been saved in the file [toy/toy.grm.grm.bin]
## Number of SNPs in each pair of individuals has been saved in the file [toy/toy.grm.grm.N.bin]
##
## Analysis finished at 16:41:12 AEST on Tue Aug 03 2021
## Overall computational time: 0.00 sec.
Read GCTA-GRM output in R
R code
### ReadGRMBin function copied from GCTA website
ReadGRMBin=function(prefix, AllN=F, size=4){
sum_i=function(i){
return(sum(1:i))
}
BinFileName=paste(prefix,".grm.bin",sep="")
NFileName=paste(prefix,".grm.N.bin",sep="")
IDFileName=paste(prefix,".grm.id",sep="")
id = read.table(IDFileName)
n=dim(id)[1]
BinFile=file(BinFileName, "rb");
grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size)
NFile=file(NFileName, "rb");
if(AllN==T){
N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size)
}
else N=readBin(NFile, n=1, what=numeric(0), size=size)
i=sapply(1:n, sum_i)
return(list(diag=grm[i], off=grm[-i], id=id, N=N))
}
data = ReadGRMBin("toy/toy.grm")
data
Output
## $diag
## [1] 1.6871693 0.3677249 0.7347884 0.3677249 0.5264550
##
## $off
## [1] -0.41931218 -0.11177249 -0.36640212 -0.41931218 0.36772487 -0.36640212
## [7] -0.73677248 0.05026455 0.10978836 0.05026455
##
## $id
## V1 V2
## 1 indi1 indi1
## 2 indi2 indi2
## 3 indi3 indi3
## 4 indi4 indi4
## 5 indi5 indi5
##
## $N
## [1] 3