ASCI 896 Statistical Genomics
Homework assignment 1
Due date
Tuesday, February 7, 5pm
Data
For this assignment, we are going to use the cattle
data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.
library(synbreed)
library(synbreedData)
help(package = "synbreedData")
data(cattle)
`?`(cattle)
pheno <- as.matrix(cattle$pheno[, 1, 1])
pheno <- scale(pheno)
dim(cattle$geno)
set.seed(100)
cattleC <- codeGeno(cattle, impute = TRUE, impute.type = "random")
Select SNP markers only on chromosome 1. Answer the five questions below by using the following two R objects: pheno
which contains the phenotype and W
which is the genotype matrix.
W <- cattleC$geno[, which(cattleC$map == 1)]
dim(W)
Question 1
Compute the allele freqeuncy of SNP markers. Recall that the expectation of genotype, \(E(W)\), is given by \(2p\), where \(p\) is the frequency of reference allele. Verify that \(2p\) is equal to the mean of each genotype obtained from the colMeans()
function.
Question 2
Recall that the variance of genotype, \(Var(W)\), is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var()
function.
Question 3
Creae a new marker matrix X
from W
and recode markers so that three genotypes \(AA\), \(Aa\), and \(aa\) are coded as 1, 0, and -1, respectively.
Recall that the expectation of genotype, \(E(X)\), is given by \(2p-1\), where \(p\) is the frequency of reference allele. Verify that \(2p-1\) is equal to the mean of each genotype obtained from the colMeans()
function.
Question 4
Recall that the variance of genotype, \(Var(X)\), remains the same and is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var()
function.
Question 5
Verify that no matter how we code markers, centered marker codes, \(W - E(W)\) and \(X - E(X)\), are the same.