Overview

We will learn how to compute allele and genotypic frequencies in R by using the beef cattle data set.

Use the function read.table to read the genotype file (imputed1000.geno) in a data frame format.

W <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)

We can access a certain element in the data frame by entering its coordinate in the single square bracket [] operator. Let’s first access the element in the first column and the first row. When the row coordinate is omitted, the operator returns a data frame with just a single column.

W[1, 1]  # 1st row and 1st column
head(W[, 1])  # one-column data frame

We then drop the first column of data frame which is the animal IDs. The - sign indicates dropping variables. So -1 means dropping the first column.

W <- W[, -1]

Next, we will convert W into a matrix from a data frame. In R, matrices are more memory efficient and more convinient to do linear algebra-type of operations.

W <- as.matrix(W)

What is the dimension of W?

dim(W)

Recoding of markers

We will recode the genotypes so that three genotypes AA, Aa, and aa are coded as 2, 1, and 0, respectively.

W[W == 10] <- 2
W[W == 0] <- 1
W[W == -10] <- 0

Allele frequency

Let’s compute the allele frequency of SNP. The table function returns frequncies of genotypes.

table(W[, 1])

We can see that there are 563 AA animals, 372 Aa animals, and 65 aa animals. Let’s assign these numbers into variables.

nAA <- table(W[, 1])[3]
nAa <- table(W[, 1])[2]
naa <- table(W[, 1])[1]

Allele frequency of A is given by $f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}.$

Exercise 1

Use the variables nAA, nAa, and naa defined above and compute the allele frequency of A and a in the first SNP.

Genotypic frequency

Genotypic frequency is given by $f(AA) = P = \frac{\text{No. of } AA \text{ individuals}}{\text{Total no. individuals}} \\ f(Aa) = H = \frac{\text{No. of } Aa \text{ individuals}}{\text{Total no. individuals}} \\ f(aa) = Q = \frac{\text{No. of } aa \text{ individuals}}{\text{Total no. individuals}}. \\$

Exercise 2

What is the genotypic frequency of AA, Aa, and aa in the first SNP?

Another approach for obtaining allele frequency

$f(A) = p = \frac{2 \times (\text{frequency of } AA) + 1 \times (\text{frequency of } Aa)}{2 \times (\text{frequency of } AA + Aa + aa)}.$

Exercise 3

Use the variables P, H, and Q defined above and compute the allele frequency of A and a in the first SNP.

Save R objects

Save the variable W so that we can reuse in the next class.

save(W, file = "W.Rda")

January 24, 2017