APSC 5984 Complex Trait Genomics
Homework assignment 1
Due date
Friday, February 14 @5pm
Arabidopsis data
We will analyze Arabidopsis recombinant inbred lines available at the Bay-0 x Shahdara project. Download the phenotypic and genotypic data. The target trait in this homework assignment is Flowering time in Long Days measured as the number of days between germination and bolting (the beginning of the inflorescence growth). You can learn more about the Bay-0 x Shahdara population in Loudet et al., (2002).
library(readxl)
# phenotypes
bayxshaY <- (read_excel("BayxSha_PublishedPheno.xls", col_names=FALSE, sheet=1, skip=4))
y0 <- as.numeric(unlist(bayxshaY[,6]))
y1 <- y0[-which(is.na(y0))]
y1 <- scale(y1)
# genotypes
bayxshaG <- read_excel("BayxSha_2_Genotypes.xls", col_names=FALSE, skip=5)
bayxshaG.id <- bayxshaG[,1]
bayxshaG <- bayxshaG[,-1]
bayxshaG[bayxshaG == "A"] <- 2
bayxshaG[bayxshaG == "C"] <- 1
bayxshaG[bayxshaG == "B"] <- 0
bayxshaG[bayxshaG == "D"] <- NA
bayxshaG <- as.matrix(bayxshaG)
mode(bayxshaG) <- "numeric"
bayxshaG <- bayxshaG[-which(is.na(y0)), ]
Question 1
What are the dimensions of y1
and bayxshaG
?
Question 2
How many markers contain missing genotype codings (NA
) ? Use the function is.na()
.
Question 3
Replace missing marker genotypes with mean values in a column-wise fashion (i.e., compute the mean of each column and replace NAs in that column). Then store marker genotypes in a matrix variable W1
. Show that the variable W1
does not include any NA
.
Question 4
Compute the allele frequencies of markers in the W1
matrix. Then store the estimated allele frequencies in a vector p
. What are the max, min, and mean of p
?
Question 5
Compute the minor allele frequencies of markers. Then store the estimated minor allele frequencies in a vector maf
. What are the max, min, and mean of maf
?
Question 6
Create a new marker matrix W2
that includes the intercept. Fit a standard OLS by using the matrix multiplication operation (%*%
). Report the first six and the last six marker effects.
Question 7
Fit ordinary least squares by using the lm()
function and W2
. Report the first six and the last six marker effects. Verify that your results agree with the ones from Question 6. Also, explain how we can obtain Std. Error
, t value
, and Pr(>|t|)
of marker effects.
Question 8
Discuss the advantages and limitations of ordinary least squares.