ASCI 896 Statistical Genomics

Due date

Tuesday, February 14, 5pm


We will continue analyzing the cattle data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

help(package = "synbreedData")
pheno <- as.matrix(cattle$pheno[, 1, 1])
pheno <- scale(pheno)
cattleC <- codeGeno(cattle, impute = TRUE, impute.type = "random")

Select SNP markers only on chromosome 1. Answer the five questions below by using the following two R objects: pheno which contains the phenotype and W which is the genotype matrix.

W <- cattleC$geno[, which(cattleC$map == 1)]

Question 1

Create a new variable W2 by subsetting the first 10 markers. The dimension of W2 is equal to \(500 \times 10\). Verify that the covariance between allelic counts is \(Cov(W2[,i], W2[,j]) \approx 2D\), where is \(D\) is the estimate of linkage disequilibrium. Use the W2 object and the LD() function from the genetics package to obtain \(D\).

Question 2

Fit a single marker regression using ordinary least squares (OLS) and estimate SNP marker effects. Create a scatter plot of SNP IDs vs. effect size of markers. Use the objects W and pheno. Save the vector of marker effects into a.

Question 3

Compute the allele frequency of reference allele for each SNP marker. Report the estimate of multi-locus additive genetic variance under the linkage equilibrium (LE) assumption.

Question 4

Compute multi-locus additive genetic variance that accounts for linkage disequilibrium (LD). Apply the expression based on the correlation between genotypes by using the cor() function. Report the estimate of additive genetic variance.

Question 5

What is the net contribution of the first locus to the total additive genetic variance?

Visualization of LD in r^2

cattleLD <- pairwiseLD(cattleC, chr = 1, type = "data.frame")
LDDist(cattleLD, type = "p", pch = 19, colD = hsv(alpha = 0.1, v = 0))

Gota Morota

January 31, 2017