Decoding mixed model equations
Quantitative Genetics and Genomics Workshop @ESALQ
Background
This is the day 1 computer lab session for quantitative genetic analysis. The goal of this exercise is to set up mixed model equations (MME) and obtain solutions for best linear unbiased estimator (BLUE) and best linear unbiased prediction (BLUP).
Data
We are going to use the australia.soybean
data included in the agridat package. This dataset contains multi-environment trial of soybean in Australia for 58 variesties.
# install.packages('agridat')
library(agridat)
data(australia.soybean, package = "agridat")
`?`(australia.soybean)
head(australia.soybean)
We will use env
, gen
, and yield
columns in the current analysis. We can see that there are eight environments, each having 58 observatios. Also, each genotype is replicated eight times (one per environment).
dat <- australia.soybean[, c(1, 4, 5)]
dim(dat)
head(dat)
table(dat$env)
table(dat$gen)
table(dat$env, dat$gen)
The BLUP model we would like to set up is \[ \text{yield} = \text{Intercept} + \text{Environment} + \text{Genotype} + \text{Residual} \] We can rewrite this as \[ \mathbf{y} = \mathbf{Xb} + \mathbf{Zu} + \boldsymbol{\epsilon} \]
We will first create the variable y
including a vector of yields.
y <- dat$yield
We will now proceed to set up incident matrices. The incidence matrix X
can be easily created using the model.matrix
function.
X <- model.matrix(~+env, dat)
head(X)
head(dat)
tail(X)
tail(dat)
We will use the model.matrix
function again to create the Z
matrix.
Z <- model.matrix(~-1 + gen, dat)
head(Z)
head(dat)
tail(Z)
tail(dat)
We assume the following distribution. \[ \mathbf{u} \sim N(0, \mathbf{I}\sigma^2_u), \text{ where } \sigma^2_u = 0.1990775 \] \[ \boldsymbol{\epsilon} \sim N(0, \mathbf{I}\sigma^2_{\epsilon}), \text{ where } \sigma^2_e = 0.2508898\\ \]
sigma2u <- 0.1990775
sigma2e <- 0.2508898
We wil now start building the elements of MME. Recall that MME is given by
X’X
Creatig the X’X matrix.
XtX <- t(X) %*% X
XtX
X’Z
Creating the X’Z matrix.
XtZ <- t(X) %*% Z
XtZ
Z’X
Creating the Z’X matrix.
ZtX <- t(Z) %*% X
ZtX
Z’Z
Creating the Z’Z matrix.
ZtZ <- t(Z) %*% Z
ZtZ
Ratio of variance components
I <- diag(ncol(Z)) # assuming K = I
lambda <- sigma2e/sigma2u
X’y
Creating the X’y matrix.
Xty <- t(X) %*% y
Xty
sum(y)
Z’y
Creating the Z’y matrix.
Zty <- t(Z) %*% y
Zty
Building LHS and RHS
LHS1 <- cbind(XtX, XtZ)
LHS2 <- cbind(ZtX, ZtZ + I * lambda)
LHS <- rbind(LHS1, LHS2)
RHS <- rbind(Xty, Zty)
Solving for solutions
sol <- solve(LHS, RHS)
# BLUE
sol[1:8, ]
# BLUP
sol[9:15, ]
lme4 R package
We can confirm our resutls using the lme4 R package.
# install.packages('lme4')
library(lme4)
fit = lmer(yield ~ env + (1 | gen), data = dat)
fit@beta
rowMeans(ranef(fit)$gen)
rowMeans(ranef(fit)$gen)[1:7]
The variance components can also be obtained.
data.frame((summary(fit))$varcor)$vcov
Recall that the broad-sense heritability is defined as \[ H^2 = \frac{\sigma^2_u}{ \sigma^2_u + \sigma^2_{\epsilon}/\text{rep}}. \]
Thus, the estimate of \(H^2\) in this population is
sigma2u/(sigma2u + sigma2e/8)