# ASCI 896 Statistical Genomics

# Homework assignment 1

## Due date

Tuesday, February 7, 5pm

## Data

For this assignment, we are going to use the `cattle`

data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

```
library(synbreed)
library(synbreedData)
help(package = "synbreedData")
data(cattle)
`?`(cattle)
pheno <- as.matrix(cattle$pheno[, 1, 1])
pheno <- scale(pheno)
dim(cattle$geno)
set.seed(100)
cattleC <- codeGeno(cattle, impute = TRUE, impute.type = "random")
```

Select SNP markers only on chromosome 1. Answer the five questions below by using the following two R objects: `pheno`

which contains the phenotype and `W`

which is the genotype matrix.

```
W <- cattleC$geno[, which(cattleC$map == 1)]
dim(W)
```

## Question 1

Compute the allele freqeuncy of SNP markers. Recall that the expectation of genotype, \(E(W)\), is given by \(2p\), where \(p\) is the frequency of reference allele. Verify that \(2p\) is equal to the mean of each genotype obtained from the `colMeans()`

function.

## Question 2

Recall that the variance of genotype, \(Var(W)\), is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the `var()`

function.

## Question 3

Creae a new marker matrix `X`

from `W`

and recode markers so that three genotypes \(AA\), \(Aa\), and \(aa\) are coded as 1, 0, and -1, respectively.

Recall that the expectation of genotype, \(E(X)\), is given by \(2p-1\), where \(p\) is the frequency of reference allele. Verify that \(2p-1\) is equal to the mean of each genotype obtained from the `colMeans()`

function.

## Question 4

Recall that the variance of genotype, \(Var(X)\), remains the same and is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the `var()`

function.

## Question 5

Verify that no matter how we code markers, centered marker codes, \(W - E(W)\) and \(X - E(X)\), are the same.