# Review of allele and genotypic frequencies

## Overview

We will learn how to compute allele and genotypic frequencies in R by using the beef cattle data set.

## Read a file

Use the function `read.table`

to read the genotype file (imputed1000.geno) in a data frame format.

`W <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)`

We can access a certain element in the data frame by entering its coordinate in the single square bracket `[]`

operator. Let’s first access the element in the first column and the first row. When the row coordinate is omitted, the operator returns a data frame with just a single column.

```
W[1, 1] # 1st row and 1st column
head(W[, 1]) # one-column data frame
```

We then drop the first column of data frame which is the animal IDs. The `-`

sign indicates dropping variables. So `-1`

means dropping the first column.

`W <- W[, -1]`

Next, we will convert `W`

into a matrix from a data frame. In R, matrices are more memory efficient and more convinient to do linear algebra-type of operations.

`W <- as.matrix(W)`

What is the dimension of `W`

?

`dim(W)`

### Recoding of markers

We will recode the genotypes so that three genotypes *AA*, *Aa*, and *aa* are coded as 2, 1, and 0, respectively.

```
W[W == 10] <- 2
W[W == 0] <- 1
W[W == -10] <- 0
```

## Allele frequency

Let’s compute the allele frequency of SNP. The `table`

function returns frequncies of genotypes.

`table(W[, 1])`

We can see that there are 563 *AA* animals, 372 *Aa* animals, and 65 *aa* animals. Let’s assign these numbers into variables.

```
nAA <- table(W[, 1])[3]
nAa <- table(W[, 1])[2]
naa <- table(W[, 1])[1]
```

Allele frequency of *A* is given by \[
f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}.
\]

### Exercise 1

Use the variables `nAA`

, `nAa`

, and `naa`

defined above and compute the allele frequency of *A* and *a* in the first SNP.

## Genotypic frequency

Genotypic frequency is given by \[ f(AA) = P = \frac{\text{No. of } AA \text{ individuals}}{\text{Total no. individuals}} \\ f(Aa) = H = \frac{\text{No. of } Aa \text{ individuals}}{\text{Total no. individuals}} \\ f(aa) = Q = \frac{\text{No. of } aa \text{ individuals}}{\text{Total no. individuals}}. \\ \]

### Exercise 2

What is the genotypic frequency of `AA`

, `Aa`

, and `aa`

in the first SNP?

## Another approach for obtaining allele frequency

\[ f(A) = p = \frac{2 \times (\text{frequency of } AA) + 1 \times (\text{frequency of } Aa)}{2 \times (\text{frequency of } AA + Aa + aa)}. \]

### Exercise 3

Use the variables `P`

, `H`

, and `Q`

defined above and compute the allele frequency of *A* and *a* in the first SNP.

## Save R objects

Save the variable `W`

so that we can reuse in the next class.

`save(W, file = "W.Rda")`