# Overview

We will learn how to estimate BLUP of SNP marker effects (SNP-BLUP) and the equivalency between GBLUP and SNP-BLUP.

Use the function load() to load the phenotype object dat_day17.Rda, the OLS fit object fit_day17.Rda, the numerator relationship matrix A2.Rda, the genotype matrix W.Rda, the results from pedigree-based BLUP day18.Rda, and the results from genomic BLUP day20.Rda we created in class.

load(file.choose())  # dat_day17.Rda
dim(dat)
summary(fit)
dim(A2)
dim(W)

ls()

# Statistical model

The statistical model we will fit is given by \begin{align*} \mathbf{y} &= \mathbf{X}\mathbf{b} + \mathbf{W}\mathbf{a} + \mathbf{e} \\ \mathbf{a} & \sim MVN(0, \mathbf{I}\sigma^2_a) \\ \mathbf{e} & \sim MVN(0, \mathbf{I}\sigma^2_e) \\ \end{align*} where $$\mathbf{y}$$ is the vector of birth weights, $$\mathbf{X}$$ is the incident matrix for fixed effects, $$\mathbf{W}$$ is the genotype matrix, $$\mathbf{b}$$ is the vector of regression coefficients for fixed effects, $$\mathbf{a}$$ is the vector of regression coefficients for random SNP effects, and $$\mathbf{e}$$ is the vector of residuals.

# BLUP of SNP marker effects

Recall that BLUP of $$\mathbf{a}$$ is given by Henderson (1975) (doi: 10.2307/2529430) \begin{align*} BLUP(\mathbf{a}) &= E(\mathbf{a} | \mathbf{y}) \\ &= Cov(\mathbf{a}, \mathbf{y}') Var(\mathbf{y})^{-1} [\mathbf{y} - E(\mathbf{y})] \\ &= \mathbf{I}\sigma^2_{a} \mathbf{W}'\mathbf{V}^{-1}(\mathbf{y} - \mathbf{X}\hat{\mathbf{b}}), \end{align*}

where \begin{align*} \mathbf{V} &= Var(\mathbf{y}) \\ &= \mathbf{W}\mathbf{I}\sigma^2_a\mathbf{W}' + \mathbf{I}\sigma^2_e \end{align*}

We predict SNP marker effects $$\hat{\mathbf{a}}$$ in the following two steps.

• fit OLS to estimate fixed effects ($$\hat{\mathbf{b}}$$)
• fit BLUP to obtain marker effects ($$\hat{\mathbf{a}}$$) condition on the estimated $$\hat{\mathbf{b}}$$

Later we will discuss how to obtain $$\hat{\mathbf{b}}$$ and $$\hat{\mathbf{a}}$$ simultaneously using MME.

# Incidence matrix X, I, and phenotype y

We will now contruct each component one by one. First we will create the incidence matrices X and $$n$$ by $$n$$ diagonal matrix I, and the vector of phenotypes y.

dat2 <- dat[, c("AgeDam.mon.", "SEX")]
X <- model.matrix(~dat2$AgeDam.mon. + dat2$SEX)

Inxn <- diag(nrow(W))

sigma2e <- varcompSNP$sigma[2] # residual variance -> sigma^2_e lambda <- sigma2e/sigma2a # Inverse of V Here we compute the inverse of $$V$$ matrix. We will use 1) the multiplication operator %*% and 2) the function solve() to obtain the inverse of matrix. V <- W2 %*% Imxm %*% t(W2) * sigma2a + Inxn * sigma2e dim(V) Vinv <- solve(V) dim(Vinv) # SNP effects Now we compute the BLUP of SNP effects ($$\hat{\mathbf{a}}$$). ahat <- sigma2a * Imxm %*% t(W2) %*% Vinv %*% (matrix(y) - matrix(predict(fit))) ahat2 <- sigma2a * Imxm %*% t(W2) %*% Vinv %*% (matrix(y) - matrix(X %*% fit$coefficients))  # alternative
table(ahat == ahat2)
tail(ahat)

### Exercise 2

Which SNP has the largest effect? What is the name of SNP?

# Mixed model equations

The corresponding MME is givin by simply replacing $$\mathbf{Z}$$ and $$\mathbf{K}$$ in the relationship-based MME with $$\mathbf{W}$$ and $$I_{m \times m}$$, respectively. \begin{align} \begin{bmatrix} \mathbf{X'X} & \mathbf{X'W} \\ \mathbf{W'X} & \mathbf{W'W} + \mathbf{I}^{-1} \lambda \end{bmatrix} \begin{bmatrix} \hat{\mathbf{b}} \\ \hat{\mathbf{a}} \end{bmatrix} = \begin{bmatrix} \mathbf{X'y} \\ \mathbf{W'y} \end{bmatrix}. \end{align}

where $$\lambda$$ is the ratio of variance components $$\frac{\sigma^2_e}{\sigma^2_a}$$.

Henderson later showed that $$\hat{\mathbf{b}}$$ from the above equation is a solution for generalized least square estimate and linear combination of $$\hat{\mathbf{b}}$$ and $$\hat{\mathbf{a}}$$ is BLUP. Thus, we can obtain BLUE of $$\hat{\mathbf{b}}$$ and BLUP of $$\hat{\mathbf{a}}$$ simultaneously by solveing MME. \begin{align} \begin{bmatrix} \hat{\mathbf{b}} \\ \hat{\mathbf{a}} \end{bmatrix} = \begin{bmatrix} \mathbf{X'X} & \mathbf{X'W} \\ \mathbf{W'X} & \mathbf{W'W} + \mathbf{I}^{-1} \lambda \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{X'y} \\ \mathbf{W'y} \end{bmatrix} \end{align}

# The rrBLUP package

The mixed.solve() function in the rrBLUP package estimates fixed effects and variance components it and predicts BLUP of SNP effects.

# install.packages('rrBLUP')
library(rrBLUP)
rr <- mixed.solve(y, X = X, Z = W2)
names(rr)

rr$beta # fixed effect rr$Vu  # additive SNP variance
rr$Ve # residual variance head(rr$u)  # BLUP of SNP marker effects
tail(rru) ### Exercise 3 Compare the estimates of fixed effects obtained from the following: the regress package and the rrBLUP package. Use the cor() function to calculate correlations. ### Exercise 4 Compare the estimates of variance components ($$\sigma^2_a$$ and $$\sigma^2_e$$) obtained from the following: the regress package and the rrBLUP package. ### Exercise 5 Compare the estimates of BLUP of SNP marker effects obtained from the following: the SNP-BLUP (ahat) and the rrBLUP package. Use the cor() function to calculate correlations. # Relationship between GBLUP and SNP-BLUP Recall that the statistical model of GBLUP is \begin{align*} \mathbf{y} &= \mathbf{X}\mathbf{b} + \mathbf{Z}\mathbf{u} + \mathbf{e} \\ \mathbf{u} & \sim MVN(0, \mathbf{G}\sigma^2_G) \\ \mathbf{e} & \sim MVN(0, \mathbf{I}\sigma^2_e) \\ \end{align*} The SNP-BLUP model is given by \begin{align*} \mathbf{y} &= \mathbf{X}\mathbf{b} + \mathbf{W}\mathbf{a} + \mathbf{e} \\ \mathbf{a} & \sim MVN(0, \mathbf{I}\sigma^2_a) \\ \mathbf{e} & \sim MVN(0, \mathbf{I}\sigma^2_e) \\ \end{align*} The difference is that we approximate the true genetic signal $$\mathbf{g}$$ with $$\mathbf{Zu}$$ in GBLUP ($$\mathbf{g = Zu}$$), whereas we approximate the genetic signal with $$\mathbf{Wa}$$ in SNP-BLUP ($$\mathbf{g = Wa}$$). This suggests that the vector of estimated genomic breeding values $$\hat{\mathbf{u}}$$ is equivalent to the genotype matrix multiplied by the BLUP of SNP marker effects $$\mathbf{W}\hat{\mathbf{a}}$$. $\hat{\mathbf{u}} = \mathbf{W}\hat{\mathbf{a}}$ ### Exercise 6 Compare estimated genomic breeding values obtained from the following: the GBLUP (uhatG), the SNP-BLUP (ahat), and the rrBLUP package. Use the cor() function to calculate correlations. # The regress package The regress() function in the regress package can also predict genomic breeding values. regress.SNP <- BLUP(varcompSNP) names(regress.SNP) # GEBV head(regress.SNPMean)  # regress
head(c(uhatG))  # GBLUP