APSC 5984 Complex Trait Genomics

Due date

Monday, March 30, 5pm

Wheat data

We will analyze the wheat data from Pérez-Rodríguez et al., (2012). (doi:10.1534/g3.112.003665). Download the supporting data File S1. The R object YLD_1.RData includes 306 elite wheat lines genotyped with 1,717 diversity array technology (DArT) markers. Note that the DArT marker is coded either 0 or 1 indicating the presence or absence of allele at a given locus. The marker information is in the Marker variable.

rm(list = ls())
load("data_G3/YLD/YLD_1.RData")
ls()
[1] "A"         "Markers"   "Markers_w" "weights"   "y"         "y_w"      
W <- Markers
dim(W)
[1]  306 1717

Question 1

Read Whittaker et al. (2000) and summmarize the paper in 300 - 500 words.

Question 2

Recall that ridge estimates can be expressed as \(a^{ridge}_j = \frac{\sum^n_{i=1} w^2_{ij} }{\sum^n_{i=1} w^2_{ij} + \lambda} \times a^{ols}_{j}\). Create a centered genotype matrix Wc. Calculate the minor allele frequencies (MAF) and the amounts of shrinkage for all markers. Plot MAF on the X axis and the amounts of shrinkage on the Y axis. Interpret the figure you obtained. Use \(\lambda = 100\). Use the genotype matrix Wc.

Question 3

Fit a ridge regression using the glmnet R package. Create a scatterplot of marker effects when \(\lambda = 10\) (y-axix) and \(\lambda = 0.1\) (x-axis). Interpret the role of \(\lambda\) in the ridge regression. Use the variables y and W.

Question 4

Contrary to the ridge regression, the LASSO shrinks marker effects exactly equal to zero. Therefore, the LASSO can be used as a subset selection method.

aL <- glmnet(W, y, alpha = 1)
plot(aL$lambda, aL$df, ylab = "The number of nonzero markers", xlab = "Lambda")

This figure shows the relationship between the sequence of \(\lambda\) values and the number of nonzero markers. Interpret the figure in the context of GWAS.

Gota Morota

March 23, 2020