## Due date

Tuesday, February 7, 5pm

## Data

For this assignment, we are going to use the cattle data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

library(synbreed)
library(synbreedData)
help(package = "synbreedData")
data(cattle)
?(cattle)
pheno <- as.matrix(cattle$pheno[, 1, 1]) pheno <- scale(pheno) dim(cattle$geno)
set.seed(100)
cattleC <- codeGeno(cattle, impute = TRUE, impute.type = "random")

Select SNP markers only on chromosome 1. Answer the five questions below by using the following two R objects: pheno which contains the phenotype and W which is the genotype matrix.

W <- cattleC$geno[, which(cattleC$map == 1)]
dim(W)

## Question 1

Compute the allele freqeuncy of SNP markers. Recall that the expectation of genotype, $$E(W)$$, is given by $$2p$$, where $$p$$ is the frequency of reference allele. Verify that $$2p$$ is equal to the mean of each genotype obtained from the colMeans() function.

## Question 2

Recall that the variance of genotype, $$Var(W)$$, is given by $$2p(1-p)$$. Verify that $$2p(1-p)$$ is close to the variance of each genotype obtained from the var() function.

## Question 3

Creae a new marker matrix X from W and recode markers so that three genotypes $$AA$$, $$Aa$$, and $$aa$$ are coded as 1, 0, and -1, respectively.

Recall that the expectation of genotype, $$E(X)$$, is given by $$2p-1$$, where $$p$$ is the frequency of reference allele. Verify that $$2p-1$$ is equal to the mean of each genotype obtained from the colMeans() function.

## Question 4

Recall that the variance of genotype, $$Var(X)$$, remains the same and is given by $$2p(1-p)$$. Verify that $$2p(1-p)$$ is close to the variance of each genotype obtained from the var() function.

## Question 5

Verify that no matter how we code markers, centered marker codes, $$W - E(W)$$ and $$X - E(X)$$, are the same.

January 24, 2017