Review of allele and genotypic frequencies

Overview

We will learn how to compute allele and genotypic frequencies in R using the cattle data set.

Install and load R packages

First, we are going to use the maize data included in the synbreedData package. Learn more about the Synbreed project and synbreed R packages.

# install.packages("synbreed")
# install.packages("synbreedData")
library(synbreed)
library(synbreedData)
data(cattle)
set.seed(100)
cattle2 <- codeGeno(cattle,impute=TRUE,impute.type="random", reference.allele = "A")
W <- cattle2$geno

Allele frequency

Let’s compute the allele frequency of the first SNP. The table function returns frequncies of genotypes.

table(W[, 1])

We can see that there are 426 AA individuals, 70 Aa individuals, and 4 aa individuals. Let’s assign these numbers into variables.

nAA <- table(W[, 1])[3]
nAa <- table(W[, 1])[2]
naa <- table(W[, 1])[1]

Allele frequency of A is given by \[ f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}. \]

Exercise 1

Use the variables nAA, nAa, and naa defined above and compute the allele frequencies of A and a in the first SNP.

p <- (2 * nAA + 1 * nAa) / (2 * (nAA + nAa + naa))
p 
q <- 1 - p
q

Genotypic frequency

Genotypic frequency is given by \[ f(AA) = P = \frac{\text{No. of } AA \text{ individuals}}{\text{Total no. individuals}} \\ f(Aa) = H = \frac{\text{No. of } Aa \text{ individuals}}{\text{Total no. individuals}} \\ f(aa) = Q = \frac{\text{No. of } aa \text{ individuals}}{\text{Total no. individuals}}. \\ \]

Exercise 2

What are the genotypic frequencies of AA, Aa, and aa in the first SNP?

P <- (nAA) / (nAA + nAa + naa)
P
H <- (nAa) / (nAA + nAa + naa)
H
Q <-  (naa) / (nAA + nAa + naa)
Q 

Another approach for obtaining allele frequency

\[ f(A) = p = \frac{2 \times (\text{frequency of } AA) + 1 \times (\text{frequency of } Aa)}{2 \times (\text{frequency of } AA + Aa + aa)}. \]

Exercise 3

Use the variables P, H, and Q defined above and compute the allele frequencies of A and a in the first SNP.

p.ex3 <- (2 * P + 1 * H) / (2 * (P + H + Q))
p.ex3 
q.ex3 <- 1 - p.ex3
q.ex3

Exercise 4

What are the genotypic frequencies of AA, Aa, and aa in the second SNP?

nAA <- table(W[, 2])[3]
nAa <- table(W[, 2])[2]
naa <- table(W[, 2])[1]

p <- (2 * nAA + 1 * nAa) / (2 * (nAA + nAa + naa))
p 
q <- 1 - p
q

P <- (nAA) / (nAA + nAa + naa)
P
H <- (nAa) / (nAA + nAa + naa)
H
Q <-  (naa) / (nAA + nAa + naa)
Q 

Compute allele frequencies for all SNPs

So far we have learned how to compute the allele frequency of a single SNP using the table function. Here, we consider how to compute allele frequencies for the entire SNPs. Of course we can apply the table function manually one at a time. However, this approach takes too much time to compute allele frequencies for 7,250 SNPs. Recall that allele frequency of A is given by \[ f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}. \] We can rewrite this equation into \[ f(A) = p = \frac{(\text{no. of } A \text{ allele in the population})}{2 \times \text{total no. of individuals}}. \] This suggests that all we need is the number of \(A\) allele or reference allele \(a\) for each SNP. The sum function returns the number of reference allele \(A\).

sum(W[,1]) # sum of A allele in the first SNP
sum(W[,2]) # sum of A allele in the second SNP

How to repeat this operation for 7,250 SNPs? The colSums function returns the sum of each column in a matrix as a vector.

colSums(W) 

Note that colSums(W) gives the numerator of the above equation. We then divide this number by \(2 \times \text{total no. of individuals}\). The function nrows returns the number of rows.

p <- colSums(W) / (2 * nrow(W))

The variable p is a vector and it contains the allele frequencies of reference allele for 7,250 SNPs.

Exercise 5

What is the allele frequency of reference allele in the 400th SNP?

p[400]

Exercise 6

What is the mean of reference allele frquencies in this population?

mean(p)

Minor allele frequency

In most cases, people report a minor allele frequency, which is the frequency of less frequent allele in a given SNP. We can convert allele frequencies into minor allele frquencies by using the ifelse function.

maf <- ifelse( p > 0.5, 1-p, p)

Exercise 7

What is the minor allele frquency of reference allele in the 400th SNP?

maf[400]

Exercise 8

What is the mean of minor allele frquencies?

mean(maf)

Gota Morota
http://morotalab.org/

January 27, 2020