## ASCI 896 Statistical Genomics

### Due date

Thursday, April 6, 5pm

## Data

Resende Jr. et al. (2012) (DOI: 10.1534/genetics.111.137026) analyzed 17 traits in loblolly pine (Pinus taeda) data, which include 951 individuals genotyped with 4853 SNPs. In this homework assignment, we will use the derregressed breeding values of crown width across the planting beds at age 6 (CWAC6). Download the zip file and type the following code.

# read phenotype and SNP files
header = TRUE, stringsAsFactors = FALSE)

# remove missing phenotypes
na.index <- which(is.na(CWAC6$Derregressed_BV)) CWAC6 <- CWAC6[-na.index, ] SNP <- SNP[-na.index, ] table(CWAC6$Genotype == SNP[, 1])

# phenotypes
y <- CWAC6\$Derregressed_BV
y <- matrix(y, ncol = 1)

# markers
SNP <- SNP[, -1]  # 861 x 4853
SNP[SNP == -9] <- NA

## Question 1

Replace missing marker genotypes with mean values. Then store the marker genotypes in a matrix object X.

## Question 2

Perform a quality control by removing markers with MAF < 0.05. How many markers are removed? Save the filtered genotype matrix in X2.

## Question 3

Standardize the genotype matrix to have a mean of zero and variance of one. Save this matrix as Xs.

## Question 4

Compute the second genomic relationship matrix of VanRaden (2008) G using the entire markers. Then add a very small positive constant (e.g., 0.001) to the diagonal elements so that G matrix is invertible.

## Question 5

Set up mixed model equations (MME) by fitting the model $$\mathbf{y = 1B + Zu + e}$$, where $$\mathbf{B}$$ is the intercept, $$\mathbf{Z}$$ is the incident matrix of individuals, $$\mathbf{u}$$ is the additive genetic values, and $$\mathbf{e}$$ is the residual. Directly take the inverse of LHS to obtain the solutions for GBLUP. Report the estimates of intercept and additive genetic values. Use $$\lambda = 1.348411$$.

## Question 6

Repeat Question 5 and fit GBLUP by using the mixed.solve function in the rrBLUP R package. Report the estimates of intercept and additive genetic values. Do they agree with the estimates in Question 5? Also, report the estimated genomic heritability and the ratio of variance components $$\lambda = \frac{\sigma^2_e}{\sigma^2_A}$$.

## Question 7

Set up mixed model equations (MME) by fitting the model $$\mathbf{y = 1B + Wa + e}$$, where $$\mathbf{B}$$ is the intercept, $$\mathbf{W}$$ is the standardized marker genotypes (Xs), $$\mathbf{a}$$ is the additive marker genetic effects, and $$\mathbf{e}$$ is the residual. Directly take the inverse of LHS to obtain the solutions for marker-based GBLUP (RR-BLUP). Report the estimates of intercept and marker additive genetic effects. Use $$\lambda = 4326.212$$.

## Question 8

Repeat Question 7 and fit RR-BLUP by using the mixed.solve function in the rrBLUP R package. Report the estimates of intercept and marker additive genetic effects. Do they agree with the estimates in Question 7? Also, report the ratio of variance components $$\lambda = \frac{\sigma^2_e}{\sigma^2_a}$$.

## Question 9

Recall that BLUP of marker effects is given by $$\mathbf{X}^T (\mathbf{X}\mathbf{X}^T)^{-1} BLUP(\mathbf{u})$$. This suggests that we can go back and forth between GBLUP and RR-BLUP. Convert the esitmated additive genetic values obtained in Question 5 to marker additive genetic effects. Add a very small positive constant (e.g., 0.001) to the diagonals of $$\mathbf{XsXs^T}$$ if necessary. Do the converted marker additive genetic effects agree with the estimates we obtained in Question 7?

## Question 10

Recall that BLUP of marker effects is given by $$\mathbf{X}^T (\mathbf{X}\mathbf{X}^T)^{-1} BLUP(\mathbf{u})$$. This suggests that we can go back and forth between GBLUP and RR-BLUP. Convert the esitmated additive genetic values obtained in Question 6 to marker additive genetic effects. Add a very small positive constant (e.g., 0.001) to the diagonals of $$\mathbf{XsXs^T}$$ if necessary. Do the converted marker additive genetic effects agree with the estimates we obtained in Question 8?

## Question 11

Repeat 5 (GBLUP) but treat the first 600 individuals as a training set and predict the additive genetic values of the remaining individuals in the testing set. What is the predictive correlation in the testing set? Use $$\lambda = 1.348411$$.

## Question 12

Repeat 7 (RR-BLUP) but treat the first 600 individuals as a training set and predict the additive genetic values of the remaining individuals in the testing set. What is the predictive correlation in the testing set? Use $$\lambda = 4326.212$$. Also, compare this predictive correlation to the one from Question 11. If computed correctly, these two values should be exactly the same or very similar. Briefly explain why this is the case.

March 28, 2017