APSC 5984 Complex Trait Genomics
Homework assignment 5
Due date
Thursday, April 27, 5pm
Data
Resende Jr. et al. (2012) (DOI: 10.1534/genetics.111.137026) analyzed 17 traits in loblolly pine (Pinus taeda) data, which include 951 individuals genotyped with 4853 SNPs. In this homework assignment, we will use the derregressed breeding values of crown width across the planting beds at age 6 (CWAC6). Download the zip file and type the following code.
# read phenotype and SNP files
CWAC6 <- read.csv("Loblolly_Pine_Resende/Phenotypic_Data Folder/DATA_nassau_age6_CWAC.csv",
header = TRUE, stringsAsFactors = FALSE)
SNP <- read.csv("Loblolly_Pine_Resende/Snp_Data.csv", header = TRUE, stringsAsFactors = FALSE)
# remove missing phenotypes
na.index <- which(is.na(CWAC6$Derregressed_BV))
CWAC6 <- CWAC6[-na.index, ]
SNP <- SNP[-na.index, ]
table(CWAC6$Genotype == SNP[, 1])
# phenotypes
y <- CWAC6$Derregressed_BV
y <- matrix(y, ncol = 1)
# markers
SNP <- SNP[, -1] # 861 x 4853
SNP[SNP == -9] <- NA
Question 1
Replace missing marker genotypes with mean values. Then store the marker genotypes in a matrix object W
.
Question 2
Perform a quality control by removing markers with MAF < 0.05. How many markers are removed? Save the filtered genotype matrix in W2
.
Question 3
Scale the genotype matrix (W2
) to have a mean of zero and variance of one. Save this matrix as Ws
.
Question 4
Compute the second genomic relationship matrix of VanRaden (2008) G
using the entire markers. Then add a very small positive constant (e.g., 0.001) to the diagonal elements so that G
matrix is invertible.
Question 5
Fit GBLUP by using the mixed.solve
function in the rrBLUP R package. Report the estimates of intercept, additive genetic values, and genomic heritability.
Question 6
Repeat Question 5 by setting up mixed model equations (MME) with the model \(\mathbf{y = \mathbf{1}\mu + Zg + e}\), where \(\mu\) is the intercept, \(\mathbf{Z}\) is the incident matrix of individuals, \(\mathbf{g}\) is the additive genetic values, and \(\mathbf{e}\) is the residual. Use the ratio of variance components \(\lambda = \frac{\sigma^2_e}{\sigma^2_g}\) obtained from Question 5. Directly take the inverse of LHS to obtain the solutions for GBLUP. Report the estimates of intercept and additive genetic values. Do they agree with the estimates in Question 5?
Question 7
Fit RR-BLUP by using the mixed.solve
function in the rrBLUP R package. Report the estimates of intercept, marker additive genetic effects, and marker additive genetic variance.
Question 8
Repeat Question 7 by setting up mixed model equations (MME) with the model \(\mathbf{y = \mathbf{1}\mu + Wa + e}\), where \(\mu\) is the intercept, \(\mathbf{W}\) is the standardized marker genotypes (Ws
), \(\mathbf{a}\) is the additive marker genetic effects, and \(\mathbf{e}\) is the residual. Use the ratio of variance components \(\lambda = \frac{\sigma^2_e}{\sigma^2_a}\) obtained from Question 7. Directly take the inverse of LHS to obtain the solutions for marker-based GBLUP (RR-BLUP). Report the estimates of intercept and marker additive genetic effects. Do they agree with the estimates in Question 7?
Question 9
Repeat Questions 5 or 6 (GBLUP) but perform cross-validation by treating the first 600 individuals as a training set and predict the additive genetic values of the remaining individuals in the testing set. What is the predictive correlation in the testing set?
Question 10
Repeat Question 7 or 8 (RR-BLUP) but perform cross-validation by treating the first 600 individuals as a training set and predict the additive genetic values of the remaining individuals in the testing set. What is the predictive correlation in the testing set? Also, compare this predictive correlation to the one from Question 9. If computed correctly, these two values should be exactly the same or very similar. Briefly explain why this is the case.