Genomic BLUP
Genomic BLUP
- Overview
- Load R objects
- Statistical model
- Computing a genomic relationship matrix (G)
- Incidence matrix X
- Incidence matrix Z
- Incidence matrix I
- Variance components estimation using the varComp package
- Inverse of V
- GBLUP of genetic values
- Variance components estimation using the regress package
- Save as R object
Overview
We will learn how to connect phenotypes and genomics using genomic best linear unbiased prediction (GBLUP).
Load R objects
Use the function load()
to load the phenotype object dat_day17.Rda, the OLS fit object fit_day17.Rda, the numerator relationship matrix A2.Rda, the genotype matrix W.Rda, and the results from pedigree-based BLUP day18.Rda we created in class.
load(file.choose()) # dat_day17.Rda
dim(dat)
load(file.choose()) # fit_day17.Rda
summary(fit)
load(file.choose()) # A2.Rda
dim(A2)
load(file.choose()) # W.Rda
dim(W)
load(file.choose()) # day18.Rda
ls()
Statistical model
The statistical model we will fit is given by \[ \mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{Z}\mathbf{u} + \mathbf{e}, \] where \(\mathbf{y}\) is the vector of birth weights, \(\mathbf{X}\) and \(\mathbf{Z}\) are incident matrices for fixed and random effects, respectively, \(\mathbf{b}\) is the vector of regression coefficients for fixed effects, \(\mathbf{u}\) is the vector of regression coefficients for random genetic values, and \(\mathbf{e}\) is the vector of residuals. Recall that BLUP of \(\mathbf{u}\) is given by Henderson (1975) (doi: 10.2307/2529430) \[ \begin{align*} BLUP(\mathbf{u}) &= E(\mathbf{u} | \mathbf{y}) \\ &= Cov(\mathbf{u}, \mathbf{y}') Var(\mathbf{y})^{-1} [\mathbf{y} - E(\mathbf{y})] \\ &= \mathbf{G}\sigma^2_{G} \mathbf{Z}'\mathbf{V}^{-1}(\mathbf{y} - \mathbf{X}\hat{\mathbf{b}}), \end{align*} \] where \[ \begin{align*} \mathbf{V} &= Var(\mathbf{y}) \\ &= \mathbf{Z}\mathbf{G}\sigma^2_G\mathbf{Z}' + \mathbf{I}\sigma^2_e \end{align*} \]
We predict genomic estimated breeding values (GEBV) or genetic values \(\hat{\mathbf{u}}\) in the following two steps.
- fit OLS to estimate fixed effects (\(\hat{\mathbf{b}}\))
- fit BLUP to obtain GEBV (\(\hat{\mathbf{u}}\)) condition on the estimated \(\hat{\mathbf{b}}\)
Later we will discuss how to obtain \(\hat{\mathbf{b}}\) and \(\hat{\mathbf{u}}\) simultaneously.
Computing a genomic relationship matrix (G)
We will compute the \(\mathbf{G}\) matrix defined by VanRaden (2008). The function computeG()
accepts two arguments: 1) a genotype matrix and 2) a cutoff point for minor allele frequency (MAF).
computeG <- function(W, maf) {
p <- colMeans(W)/2
maf2 <- pmin(p, 1 - p)
index <- which(maf2 < maf)
W2 <- W[, -index]
p2 <- p[-index]
Wc <- scale(W2, center = TRUE, scale = FALSE)
G <- tcrossprod(Wc)/(2 * sum(p2 * (1 - p2)))
return(G)
}
G <- computeG(W, maf = 0.05)
dim(G)
Exercise 1
Create a boxplot comparing elements of the numerator relationship matrix and geomic relationship matrix. What is the correlation between the pedigree relationships and geomic relationships?
Exercise 2
Repeat Exercise 1 by changing the MAF threshold equal to 0.1. Also, what is the correlation between the two \(\mathbf{G}\) matrices?
Incidence matrix X
We will now contruct each component one by one. First we will create the incidence matrix \(\mathbf{X}\). We first subset the variable dat
by age of dams and sex of calves, and create a new variable dat2
. The model.matrix()
function returns the incidence matrix \(X\).
dat2 <- dat[, c("AgeDam.mon.", "SEX")]
X <- model.matrix(~dat2$AgeDam.mon. + dat2$SEX)
dim(X)
head(X)
Incidence matrix Z
Next we will create the incidence matrix \(\mathbf{Z}\).
Z <- diag(nrow(G))
dim(Z)
diag(Z)
Incidence matrix I
The third incidence matrix we create is \(\mathbf{I}\).
I <- diag(nrow(G))
dim(I)
diag(I)
Variance components estimation using the varComp package
The varComp()
function in the varComp package fits a liner mixed model and estimates variance components (\(\sigma^2_G\) and \(\sigma^2_e\)) through a restricted maximum likelihood (REML). The variance-covariance structure of \(\mathbf{u}\) (argument = varcov
) is given by the genomic relationship matrix \(\mathbf{G}\).
install.packages("varComp")
library(varComp)
`?`(varComp)
y <- dat$BWT
varcomp <- varComp(fixed = y ~ -1 + X, random = ~Z, varcov = G)
sigma2G <- varcomp$varComps # additive genomic variance
sigma2G
sigma2e <- varcomp$sigma2 # residual variance
sigma2e
Exercise 3
What is the estimate of genomic heritability? Compare the estimate with the one we obtained from the pedigree relationship matrix.
Inverse of V
Here we compute the inverse of \(V\) matrix. We will use 1) the multiplication operator %*%
and 2) the function solve()
to obtain the inverse of matrix.
V <- Z %*% G %*% t(Z) * sigma2G + I * sigma2e
dim(V)
Vinv <- solve(V)
dim(Vinv)
GBLUP of genetic values
Now we compute GEBV (\(\hat{\mathbf{u}}\)).
uhatG <- sigma2G * G %*% t(Z) %*% Vinv %*% (matrix(y) - matrix(predict(fit)))
uhatG2 <- sigma2G * G %*% t(Z) %*% Vinv %*% (matrix(y) - matrix(X %*% fit$coefficients)) # alternative
table(uhatG == uhatG2)
head(uhatG)
tail(uhatG)
Let’s plot a histogram of GEBV.
hist(uhatG, col = "lightblue", main = "Histogram", xlab = "Genomic estimated breeding values")
Exercise 4
Create a scatter plot of GEBV vs. observed values. Interpret the result.
Exercise 5
Which individual has the highest GEBV? Which individual has the lowest GEBV? Use the which.max()
and which.min()
functions.
Exercise 6
Rank animals based on their GEBV. Show the top six animal IDs and their GEBV. Use the order()
function.
Exercise 7
Create a scatter plot of EBV vs. GEBV. Compute Spearman’s rank correlation coefficient.
Variance components estimation using the regress package
The regress
function in the regress package can also fit a liner mixed model and estimate variance components (\(\sigma^2_G\) and \(\sigma^2_e\)) through REML.
install.packages("regress")
library(regress)
`?`(regress)
varcomp2 <- regress(y ~ -1 + X, ~G)
summary(varcomp2)
Exercise 6
What is the estimate of genomic heritability based on the regress()
function? Compare the estimate with the one we obtained from the varComp()
function.
Save as R object
save(uhatG, h2G, h2G.a, G, file = "day20.Rda")