Decoding mixed model equations
High-throughput Phenotyping Driven Quantitative Genetics @CMA-FCT-NOVA
Background
This is the day 1 computer lab session for quantitative genetic analysis. The goal of this exercise is to set up mixed model equations (MME) and obtain solutions for best linear unbiased estimator (BLUE) and best linear unbiased prediction (BLUP).
Data
We are going to use the australia.soybean
data included in the agridat package. This dataset contains multi-environment trial of soybean in Australia for 58 varieties.
# install.packages("agridat") if not installed yet
library(agridat)
data(australia.soybean, package = 'agridat')
?australia.soybeanhead(australia.soybean)
We will use env
, gen
, and yield
columns in the current analysis. We can see that there are eight environments, each having 58 observatios. Also, each genotype is replicated eight times (one per environment).
<- australia.soybean[, c(1, 4, 5)]
dat dim(dat)
head(dat)
table(dat$env)
table(dat$gen)
table(dat$env, dat$gen)
The BLUP model we would like to set up is \[ \text{yield} = \text{Intercept} + \text{Environment} + \text{Genotype} + \text{Residual} \] We can rewrite this as \[ \mathbf{y} = \mathbf{Xb} + \mathbf{Zu} + \boldsymbol{\epsilon} \]
We will first create the variable y
including a vector of yields.
<- dat$yield y
We will now proceed to set up incident matrices. The incidence matrix X
can be easily created using the model.matrix
function.
<- model.matrix(~ + env, dat)
X head(X)
head(dat)
tail(X)
tail(dat)
We will use the model.matrix
function again to create the Z
matrix.
<- model.matrix(~ -1 + gen, dat)
Z head(Z)
head(dat)
tail(Z)
tail(dat)
We assume the following distribution. \[ \mathbf{u} \sim N(0, \mathbf{I}\sigma^2_u), \text{ where } \sigma^2_u = 0.1990775 \] \[ \boldsymbol{\epsilon} \sim N(0, \mathbf{I}\sigma^2_{\epsilon}), \text{ where } \sigma^2_e = 0.2508898\\ \]
<- 0.1990775
sigma2u <- 0.2508898 sigma2e
We wil now start building the elements of MME. Recall that MME is given by
X’X
Creatig the X’X matrix.
<- t(X) %*% X
XtX XtX
X’Z
Creating the X’Z matrix.
<- t(X) %*% Z
XtZ XtZ
Z’X
Creating the Z’X matrix.
<- t(Z) %*% X
ZtX ZtX
Z’Z
Creating the Z’Z matrix.
<- t(Z) %*% Z
ZtZ ZtZ
Ratio of variance components
<- diag(ncol(Z)) # assuming K = I
I <- sigma2e / sigma2u lambda
X’y
Creating the X’y matrix.
<- t(X) %*% y
Xty
Xty
sum(y)
Z’y
Creating the Z’y matrix.
<- t(Z) %*% y
Zty Zty
Building LHS and RHS
<- cbind(XtX, XtZ)
LHS1 <- cbind(ZtX, ZtZ + I*lambda)
LHS2 <- rbind(LHS1, LHS2)
LHS
<- rbind(Xty, Zty) RHS
Solving for solutions
<- solve(LHS, RHS)
sol
# BLUE
1:8, ]
sol[
# BLUP
9:15, ] sol[
lme4 R package
We can confirm our resutls using the lme4 R package.
# install.packages("lme4")
library(lme4)
= lmer(yield ~ env + (1 | gen), data=dat)
fit @beta
fit
rowMeans(ranef(fit)$gen)
rowMeans(ranef(fit)$gen)[1:7]
The variance components can also be obtained.
data.frame((summary(fit))$varcor)$vcov
Recall that the broad-sense heritability is defined as \[ H^2 = \frac{\sigma^2_u}{ \sigma^2_u + \sigma^2_{\epsilon}/\text{rep}}. \]
Thus, the estimate of \(H^2\) in this population is
/ (sigma2u + sigma2e/8) sigma2u