## Due date

Tuesday, February 12, 5pm

## Data

We will continue analyzing the cattle data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

library(synbreed)
library(synbreedData)
help(package = "synbreedData")
data(cattle)
?(cattle)
pheno <- as.matrix(cattle$pheno[, 1, 1]) pheno <- scale(pheno) dim(cattle$geno)
set.seed(100)
cattleC <- codeGeno(cattle, impute = TRUE, impute.type = "random", reference.allele = "minor")

Select SNP markers only on chromosome 1. Answer the seven questions below by using the following two R objects: pheno which contains the phenotype and W which is the genotype matrix.

W <- cattleC$geno[, which(cattleC$map == 1)]
dim(W)

## Question 1

Create a new variable W2 by subsetting the first 10 markers. The dimension of W2 is equal to $$500 \times 10$$. Verify that the covariance between allelic counts is $$Cov(W2[,i], W2[,j]) \approx 2D$$, where $$D$$ is the estimate of linkage disequilibrium. Use the W2 object and the LD() function from the genetics package to obtain $$D$$.

## Question 2

Recall that $$r^2$$ of Hill and Robertson (1968) and $$r^2$$ (correlation squared) directly applied to SNP marker matrix (allelic counts) are theoretically equivalent. Check whether these two are the same using the W2 object. Create a scatter plot of $$r^2$$ (Hill and Robertson) vs. $$r^2$$ (correlation squared of SNP matrix). Use the LD() and the cor() functions. How good is the agreement?

## Question 3

Perform GWAS using single marker ordinary least squares (OLS) and estimate SNP marker effects. Use the objects W and pheno, and the function summary(lm()). Save the vector of marker effects into a.

## Question 4

Recode the SNP genotypes so that now the major allele is treated as a reference allele. Store the new coding into the W2 variable. Perform single marker GWAS using OLS and estimate SNP marker effects. Use the objects W2 and pheno, and the function lm(). Save the vector of marker effects into a2. Compare a and a2.

## Question 5

Compute the allele frequency of reference allele for each SNP marker. Report the estimate of multi-locus additive genetic variance under the linkage equilibrium (LE) assumption. Use the object W.

## Question 6

Compute multi-locus additive genetic variance that accounts for linkage disequilibrium (LD). Apply the expression based on the correlation between genotypes by using the cor() function. Report the estimate of additive genetic variance. Use the object W.

## Question 7

What is the proportion of the genetic variance under LD that is explained by the genetic variance under LE?

## Visualization of LD in r^2

?(pairwiseLD)
cattleLD <- pairwiseLD(cattleC, chr = 1, type = "data.frame")
LDDist(cattleLD, type = "p", pch = 19, colD = hsv(alpha = 0.1, v = 0))

January 30, 2018