## Overview

We will learn how to compute allele and genotypic frequencies in R by using the beef cattle data set.

Use the function read.table to read the genotype file (imputed1000.geno) in a data frame format.

W <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)

We can access a certain element in the data frame by entering its coordinate in the single square bracket [] operator. Let’s first access the element in the first column and the first row. When the row coordinate is omitted, the operator returns a data frame with just a single column.

W[1, 1]  # 1st row and 1st column
head(W[, 1])  # one-column data frame

We then drop the first column of data frame which is the animal IDs. The - sign indicates dropping variables. So -1 means dropping the first column.

W <- W[, -1]

Next, we will convert W into a matrix from a data frame. In R, matrices are more memory efficient and more convinient to do linear algebra-type of operations.

W <- as.matrix(W)

What is the dimension of W?

dim(W)

### Recoding of markers

We will recode the genotypes so that three genotypes AA, Aa, and aa are coded as 2, 1, and 0, respectively.

W[W == 10] <- 2
W[W == 0] <- 1
W[W == -10] <- 0

## Allele frequency

Let’s compute the allele frequency of SNP. The table function returns frequncies of genotypes.

table(W[, 1])

We can see that there are 563 AA animals, 372 Aa animals, and 65 aa animals. Let’s assign these numbers into variables.

nAA <- table(W[, 1])
nAa <- table(W[, 1])
naa <- table(W[, 1])

Allele frequency of A is given by $f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}.$

### Exercise 1

Use the variables nAA, nAa, and naa defined above and compute the allele frequency of A and a in the first SNP.

## Genotypic frequency

Genotypic frequency is given by $f(AA) = P = \frac{\text{No. of } AA \text{ individuals}}{\text{Total no. individuals}} \\ f(Aa) = H = \frac{\text{No. of } Aa \text{ individuals}}{\text{Total no. individuals}} \\ f(aa) = Q = \frac{\text{No. of } aa \text{ individuals}}{\text{Total no. individuals}}. \\$

### Exercise 2

What is the genotypic frequency of AA, Aa, and aa in the first SNP?

## Another approach for obtaining allele frequency

$f(A) = p = \frac{2 \times (\text{frequency of } AA) + 1 \times (\text{frequency of } Aa)}{2 \times (\text{frequency of } AA + Aa + aa)}.$

### Exercise 3

Use the variables P, H, and Q defined above and compute the allele frequency of A and a in the first SNP.

## Save R objects

Save the variable W so that we can reuse in the next class.

save(W, file = "W.Rda")

January 24, 2017