ASCI 896 Statistical Genomics

Due date

Tuesday, February 7, 5pm

Data

For this assignment, we are going to use the cattle data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

library(synbreed)
library(synbreedData)
help(package = "synbreedData")
data(cattle)
`?`(cattle)
pheno <- as.matrix(cattle$pheno[, 1, 1])
pheno <- scale(pheno)
dim(cattle$geno)
set.seed(100)
cattleC <- codeGeno(cattle, impute = TRUE, impute.type = "random")

Select SNP markers only on chromosome 1. Answer the five questions below by using the following two R objects: pheno which contains the phenotype and W which is the genotype matrix.

W <- cattleC$geno[, which(cattleC$map == 1)]
dim(W)

Question 1

Compute the allele freqeuncy of SNP markers. Recall that the expectation of genotype, \(E(W)\), is given by \(2p\), where \(p\) is the frequency of reference allele. Verify that \(2p\) is equal to the mean of each genotype obtained from the colMeans() function.

Question 2

Recall that the variance of genotype, \(Var(W)\), is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var() function.

Question 3

Creae a new marker matrix X from W and recode markers so that three genotypes \(AA\), \(Aa\), and \(aa\) are coded as 1, 0, and -1, respectively.

Recall that the expectation of genotype, \(E(X)\), is given by \(2p-1\), where \(p\) is the frequency of reference allele. Verify that \(2p-1\) is equal to the mean of each genotype obtained from the colMeans() function.

Question 4

Recall that the variance of genotype, \(Var(X)\), remains the same and is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var() function.

Question 5

Verify that no matter how we code markers, centered marker codes, \(W - E(W)\) and \(X - E(X)\), are the same.

Gota Morota

January 24, 2017