Data

Resende Jr. et al. (2012) (DOI: 10.1534/genetics.111.137026) analyzed 17 traits in loblolly pine (Pinus taeda) data, which include 951 individuals genotyped with 4853 SNPs. In this homework assignment, we will use the derregressed breeding values crown width across the planting beds at age 6 (CWAC6). Download the zip file and type the following code.

# read phenotype and SNP files
CWAC6 <- read.csv("DATA_nassau_age6_CWAC.csv", header=TRUE, stringsAsFactors = FALSE)
SNP <- read.csv("Snp_Data.csv", header=TRUE, stringsAsFactors = FALSE)

# remove missing phenotypes 
na.index <-  which(is.na(CWAC6$Derregressed_BV))
CWAC6 <- CWAC6[-na.index, ]
SNP <- SNP[-na.index, ]
table(CWAC6$Genotype == SNP[,1])

# phenotypes 
y <- CWAC6$Derregressed_BV
y <- matrix(y, ncol=1)

# markers 
SNP <- SNP[,-1] # 861 x 4853 
SNP[SNP == -9] <- NA

Genotype imputation

Replace missing marker genotypes with mean values. Then store the marker genotypes in a matrix object X.

X <- matrix(0, ncol=ncol(SNP), nrow=nrow(SNP))
for (j in 1:ncol(SNP)){
  #cat("j = ", j, '\n')
  X[,j] <- ifelse(is.na(SNP[,j]), mean(SNP[,j], na.rm=TRUE), SNP[,j])
}

Quality control

Perform a quality control by removing markers with MAF < 0.05. How many markers are removed? Save the filtered genotype matrix in X2.

freq <- colMeans(X) / 2 
maf <- ifelse(freq > 0.5, 1-freq, freq)
maf.index <- which(maf < 0.05)
length(maf.index)
X2 <- X[, -maf.index] #  861 x 3206

Standardization of genotype matrix

Standardize the genotype matrix to have a mean of zero and variance of one. Save this matrix as Xs.

Xs <- scale(X2, center = TRUE, scale = TRUE)

# dimensions 
n <- nrow(Xs)
m <- ncol(Xs)

Compute \(G\) matrix

Compute the second genomic relationship matrix of VanRaden (2008) G using the entire markers. Then add a very small positive constant (e.g., 0.001) to the diagonal elements so that G matrix is invertible.

G <- tcrossprod(Xs) / ncol(Xs)
G <- G + diag(n)*0.001

Question 1

Set up mixed model equations (MME) by fitting the model \(\mathbf{y = 1B + Zu + e}\), where \(\mathbf{B}\) is the intercept, \(\mathbf{Z}\) is the incident matrix of individuals, \(\mathbf{u}\) is the additive genetic values, and \(\mathbf{e}\) is the residual. Directly take the inverse of LHS to obtain the solutions for GBLUP. Report the estimates of intercept and additive genetic values. Use \(\lambda = 1.348411\).

Question 2

Repeat Question 1 and fit GBLUP by using the mixed.solve function in the rrBLUP R package. Report the estimates of intercept and additive genetic values. Do they agree with the estimates in Question 2? Also, report the estimated genomic heritability and the ratio of variance components \(\lambda = \frac{\sigma^2_e}{\sigma^2_A}\).

Question 3

Set up mixed model equations (MME) by fitting the model \(\mathbf{y = 1B + Wa + e}\), where \(\mathbf{B}\) is the intercept, \(\mathbf{W}\) is the standardized marker genotypes (Xs), \(\mathbf{a}\) is the additive marker genetic effects, and \(\mathbf{e}\) is the residual. Directly take the inverse of LHS to obtain the solutions for marker-based GBLUP (RR-BLUP). Report the estimates of intercept and marker additive genetic effects. Use \(\lambda = 4326.212\).

Question 4

Repeat Question 3 and fit RR-BLUP by using the mixed.solve function in the rrBLUP R package. Report the estimates of intercept and marker additive genetic effects. Do they agree with the estimates in Question 3? Also, report the ratio of variance components \(\lambda = \frac{\sigma^2_e}{\sigma^2_a}\).

Question 5

Recall that BLUP of marker effects is given by \(\mathbf{X}^T (\mathbf{X}\mathbf{X}^T)^{-1} BLUP(\mathbf{u})\). This suggests that we can go back and forth between GBLUP and RR-BLUP. Convert the esitmated additive genetic values obtained in Question 1 to marker additive genetic effects. Add a very small positive constant (e.g., 0.001) to the diagonals of \(\mathbf{XsXs^T}\) if necessary. Do the converted marker additive genetic effects agree with the estimates we obtained in Question 3?

Question 6

Recall that BLUP of marker effects is given by \(\mathbf{X}^T (\mathbf{X}\mathbf{X}^T)^{-1} BLUP(\mathbf{u})\). This suggests that we can go back and forth between GBLUP and RR-BLUP. Convert the esitmated additive genetic values obtained in Question 2 to marker additive genetic effects. Add a very small positive constant (e.g., 0.001) to the diagonals of \(\mathbf{XsXs^T}\) if necessary. Do the converted marker additive genetic effects agree with the estimates we obtained in Question 4?

Question 7

Repeat 1 (GBLUP) but treat the first 600 individuals as a training set and predict the additive genetic values of the remaining individuals in the testing set. What is the predictive correlation in the testing set? Use \(\lambda = 1.348411\).

Question 8

Repeat 3 (RR-BLUP) but treat the first 600 individuals as a training set and predict the additive genetic values of the remaining individuals in the testing set. What is the predictive correlation in the testing set? Use \(\lambda = 4326.212\). Also, compare this predictive correlation to the one from Question 7. If computed correctly, these two values should be exactly the same or very similar. Briefly explain why this is the case.

Genomic best linear unbiased prediction

Gota Morota, Matt Spangler

May 17, 2016