## Due date

Thursday, February 1, 5pm

## Data

For this assignment, we are going to use the cattle data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

library(synbreed)
library(synbreedData)
help(package="synbreedData")
data(cattle)
?cattle
dim(cattle$geno) set.seed(100) cattleC <- codeGeno(cattle,impute=TRUE,impute.type="random", reference.allele = "minor") Select SNP markers only on chromosome 1. Answer the six questions below using the W variable, which is a SNP genotype matrix. W <- cattleC$geno[,which(cattleC\$map==1)]
dim(W)

## Question 1

Compute the allele frequency of SNP markers. Recall that the expectation of genotype, $$E(W)$$, is given by $$2p$$, where $$p$$ is the frequency of reference allele. Verify that $$2p$$ is equal to the mean of each genotype obtained from the colMeans() function.

## Question 2

Recall that the variance of genotype, $$Var(W)$$, is given by $$2p(1-p)$$. Verify that $$2p(1-p)$$ is close to the variance of each genotype obtained from the var() function.

## Question 3

Create a new marker matrix X from W and recode markers so that three genotypes $$AA$$, $$Aa$$, and $$aa$$ are coded as 1, 0, and -1, respectively.

Recall that the expectation of genotype, $$E(X)$$, is given by $$2p-1$$, where $$p$$ is the frequency of reference allele. Verify that $$2p-1$$ is equal to the mean of each genotype obtained from the colMeans() function.

## Question 4

Recall that the variance of genotype, $$Var(X)$$, remains the same and is given by $$2p(1-p)$$. Verify that $$2p(1-p)$$ is close to the variance of each genotype obtained from the var() function.

## Question 5

Verify that no matter how we code markers, centered marker codes, $$W - E(W)$$ and $$X - E(X)$$, remain the same.

## Question 6

We will recode the SNP genotypes so that now the major allele is treated as a reference allele. Store the new coding into the W2 variable.

W2 <- W
W2[W2==0] <- 3
W2[W2==2] <- 0
W2[W2==3] <- 2

Compute the allele freqeuncy of SNP markers using W2. Compare your result with the allele frequency obtained from W.

January 18, 2018