# High-throughput Phenotyping Driven Quantitative Genetics @CMA-FCT-NOVA

## Background

This is the day 1 computer lab session for quantitative genetic analysis. The goal of this exercise is to set up mixed model equations (MME) and obtain solutions for best linear unbiased estimator (BLUE) and best linear unbiased prediction (BLUP).

## Data

We are going to use the australia.soybean data included in the agridat package. This dataset contains multi-environment trial of soybean in Australia for 58 varieties.

# install.packages("agridat") if not installed yet
library(agridat)
data(australia.soybean, package = 'agridat')

?australia.soybean
head(australia.soybean)

We will use env , gen, and yield columns in the current analysis. We can see that there are eight environments, each having 58 observatios. Also, each genotype is replicated eight times (one per environment).

dat <- australia.soybean[, c(1, 4, 5)]
dim(dat)
table(dat$env) table(dat$gen)
table(dat$env, dat$gen)

The BLUP model we would like to set up is $\text{yield} = \text{Intercept} + \text{Environment} + \text{Genotype} + \text{Residual}$ We can rewrite this as $\mathbf{y} = \mathbf{Xb} + \mathbf{Zu} + \boldsymbol{\epsilon}$

We will first create the variable y including a vector of yields.

y <- dat$yield We will now proceed to set up incident matrices. The incidence matrix X can be easily created using the model.matrix function. X <- model.matrix(~ + env, dat) head(X) head(dat) tail(X) tail(dat) We will use the model.matrix function again to create the Z matrix. Z <- model.matrix(~ -1 + gen, dat) head(Z) head(dat) tail(Z) tail(dat) We assume the following distribution. $\mathbf{u} \sim N(0, \mathbf{I}\sigma^2_u), \text{ where } \sigma^2_u = 0.1990775$ $\boldsymbol{\epsilon} \sim N(0, \mathbf{I}\sigma^2_{\epsilon}), \text{ where } \sigma^2_e = 0.2508898\\$ sigma2u <- 0.1990775 sigma2e <- 0.2508898 We wil now start building the elements of MME. Recall that MME is given by ## X’X Creatig the X’X matrix. XtX <- t(X) %*% X XtX  ## X’Z Creating the X’Z matrix. XtZ <- t(X) %*% Z XtZ  ## Z’X Creating the Z’X matrix. ZtX <- t(Z) %*% X ZtX ## Z’Z Creating the Z’Z matrix. ZtZ <- t(Z) %*% Z ZtZ ## Ratio of variance components I <- diag(ncol(Z)) # assuming K = I lambda <- sigma2e / sigma2u  ## X’y Creating the X’y matrix. Xty <- t(X) %*% y Xty sum(y) ## Z’y Creating the Z’y matrix. Zty <- t(Z) %*% y Zty ## Building LHS and RHS LHS1 <- cbind(XtX, XtZ) LHS2 <- cbind(ZtX, ZtZ + I*lambda) LHS <- rbind(LHS1, LHS2) RHS <- rbind(Xty, Zty) ## Solving for solutions sol <- solve(LHS, RHS) # BLUE sol[1:8, ] # BLUP sol[9:15, ] ## lme4 R package We can confirm our resutls using the lme4 R package. # install.packages("lme4") library(lme4) fit = lmer(yield ~ env + (1 | gen), data=dat) fit@beta rowMeans(ranef(fit)$gen)
rowMeans(ranef(fit)$gen)[1:7] The variance components can also be obtained. data.frame((summary(fit))$varcor)\$vcov

Recall that the broad-sense heritability is defined as $H^2 = \frac{\sigma^2_u}{ \sigma^2_u + \sigma^2_{\epsilon}/\text{rep}}.$

Thus, the estimate of $$H^2$$ in this population is

sigma2u / (sigma2u + sigma2e/8)