# Overview

We will learn how to connect phenotypes and pedigree using pedigree-based best linear unbiased prediction (PBLUP).

# Load R objects

Use the function load() to load the phenotype object (dat_day17.Rda), the OLS fit object (fit_day17.Rda), and the numerator relationship matrix (A2.Rda) we created in class (link1 and link2).

load(file.choose())  # dat_day17.Rda
dim(dat)
summary(fit)
dim(A2)

# Statistical model

The statistical model we fit is given by $\mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{Z}\mathbf{u} + \mathbf{e},$ where $$\mathbf{y}$$ is the vector of birth weights, $$\mathbf{X}$$ and $$\mathbf{Z}$$ are incident matrices for fixed and random effects, respectively, $$\mathbf{b}$$ is the vector of regression coefficients for fixed effects, $$\mathbf{u}$$ is the vector of regression coefficients for random genetic values, and $$\mathbf{e}$$ is the vector of residuals. Recall that BLUP of $$\mathbf{u}$$ is given by Henderson (1975) (doi: 10.2307/2529430) \begin{align*} BLUP(\mathbf{u}) &= E(\mathbf{u} | \mathbf{y}) \\ &= Cov(\mathbf{u}, \mathbf{y}') Var(\mathbf{y})^{-1} [\mathbf{y} - E(\mathbf{y})] \\ &= \mathbf{A}\sigma^2_{A} \mathbf{Z}'\mathbf{V}^{-1}(\mathbf{y} - \mathbf{X}\hat{\mathbf{b}}), \end{align*} where \begin{align*} \mathbf{V} &= Var(\mathbf{y}) \\ &= \mathbf{Z}\mathbf{A}\sigma^2_A\mathbf{Z}' + \mathbf{I}\sigma^2_e \end{align*}

We predict estimated breeding values (EBV) or genetic values $$\hat{\mathbf{u}}$$ in the following two steps.

• fit OLS to estimate fixed effects ($$\hat{\mathbf{b}}$$)
• fit BLUP to obtain EBV ($$\hat{\mathbf{u}}$$) condition on the estimated $$\hat{\mathbf{b}}$$

Later we will discuss how to obtain $$\hat{\mathbf{b}}$$ and $$\hat{\mathbf{u}}$$ simultaneously.

# Incidence matrix X

We will now contruct each component one by one. First we will create the incidence matrix $$\mathbf{X}$$. We first subset the variable dat by age of dams and sex of calves, and create a new variable dat2. The model.matrix() function returns the incidence matrix $$X$$.

dat2 <- dat[, c("AgeDam.mon.", "SEX")]
X <- model.matrix(~dat2$AgeDam.mon. + dat2$SEX)
dim(X)
head(X)

# Incidence matrix Z

Next we will create the incidence matrix $$\mathbf{Z}$$.

Z <- diag(nrow(A2))
dim(Z)
diag(Z)

# Incidence matrix I

The third incidence matrix we create is $$\mathbf{I}$$.

I <- diag(nrow(A2))
dim(I)
diag(I)

# Variance components estimation using the varComp package

The varComp() function in the varComp package fits a liner mixed model and estimate variance components ($$\sigma^2_A$$ and $$\sigma^2_e$$) through a restricted maximum likelihood (REML). The variance-covariance structure of $$\mathbf{u}$$ (argument = varcov) is given by the numerator relationship matrix $$\mathbf{A}$$.

install.packages("varComp")
library(varComp)
?(varComp)

y <- dat$BWT varcomp <- varComp(fixed = y ~ -1 + X, random = ~Z, varcov = A2) sigma2A <- varcomp$varComps  # additive genetic variance
sigma2A
sigma2e <- varcomp$sigma2 # residual variance sigma2e ### Exercise 1 What is the estimate of heritability? Compare the estimate with the one we obtained from intraclass correlation using the sire model. # Inverse of V Here we compute the inverse of $$V$$ matrix. We will use 1) the multiplication operator %*% and 2) the function solve() to obtain the inverse of matrix. V <- Z %*% A2 %*% t(Z) * sigma2A + I * sigma2e dim(V) Vinv <- solve(V) dim(Vinv) # PBLUP of genetic values Now we compute EBV ($$\hat{\mathbf{u}}$$). uhatA <- sigma2A * A2 %*% t(Z) %*% Vinv %*% (matrix(y) - matrix(predict(fit))) uhatA2 <- sigma2A * A2 %*% t(Z) %*% Vinv %*% (matrix(y) - matrix(X %*% fit$coefficients))  # alternative
table(uhatA == uhatA2)
tail(uhatA)

Let’s plot a histogram of EBV.

hist(uhatA, col = "lightblue", main = "Histogram", xlab = "Estimated breeding values")

### Exercise 2

Create a scatter plot of EBV vs. observed values. Interpret the result.

### Exercise 3

Which individual has the highest EBV? Which individual has the lowest EBV? Use the which.max() and which.min() functions.

### Exercise 4

Rank animals based on their EBV. Show the top six animal IDs and their EBV. Use the order() function.

### Exercise 5

If you are going to select only top 10% of animals, which animals would you pick?

# Variance components estimation using the regress package

The regress function in the regress package can also fit a liner mixed model and estimate variance components ($$\sigma^2_A$$ and $$\sigma^2_e$$) through REML.

install.packages("regress")
library(regress)
?(regress)

varcomp2 <- regress(y ~ -1 + X, ~A2)
summary(varcomp2)

### Exercise 6

What is the estimate of heritability based on the regress() function? Compare the estimate with the one we obtained from the varComp() function.

# Save as R object

save(uhatA, h2A, h2A.a, file = "day18.Rda")

March 9, 2017