Homework assignment 1

Due date: Friday, February 5, 5pm

For this assignment, we are going to use the cattle data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

pheno <- as.matrix(cattle$pheno[,1,1])
## [1]  500 7250
cattleC <- codeGeno(cattle,impute=TRUE,impute.type="random")
##      Summary of imputation 
##     total number of missing values                : 10000 
##     number of random imputations                  : 10000

Select SNP markers only on chromosome 1. Answer the six questions below by using the following two R objects: pheno which contains the phenotype and W which is the genotype matrix.

W <- cattleC$geno[,which(cattleC$map==1)]
## [1] 500 250

Question 1

Compute the allele freqeuncy of SNP markers. Recall that the expectation of genotype, \(E(W)\), is given by \(2p\), where \(p\) is the frequency of reference allele. Verify that \(2p\) is equal to the mean of each genotype obtained from the ColMeans() function.

Question 2

Recall that the variance of genotype, \(Var(W)\), is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var() function.

Question 3

Fit a single marker regression using ordinary least squares (OLS) and estimate SNP marker effects. Create a scatter plot of SNP IDs vs. effect size of markers.

Question 4

The dimension of genotype matrix W is 500 x 250. Can you use OLS for the 250 markers? If not, explain why this is not possible.

Question 5

Compute multi-locus additive genetic variance under linkage equilibrium (LE) assumption. Compare this value with the additive genetic variance that accounts for covariance between genotypes. Apply the expression based on the correlation between genotypes by using the cor() function. Report the estimates of genomic heritability.

Question 6

What is the net contribution of the first locus to the total additive genetic variance?