Introduction to Quantitative Genetics

Data

We are going to use the mice data included in the BGLR package.

library(BGLR)
data(mice)
W <- mice.X[, 1:100]

We will learn how to compute allele and genotypic frequencies in R using the cattle data set.

Let’s compute the allele frequency of the first SNP. The table function returns frequncies of genotypes.

table(W[, 1])

We can see that there are 541 AA animals, 929 Aa animals, and 344 aa animals. Let’s assign these numbers into variables.

nAA <- table(W[, 1])[3]
nAa <- table(W[, 1])[2]
naa <- table(W[, 1])[1]

Allele frequency of A is given by \[ f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}. \]

Exercise 1

Use the variables nAA, nAa, and naa defined above and compute the allele frequencies of A and a in the first SNP.

p <- (2 * nAA + 1 * nAa)/(2 * (nAA + nAa + naa))
p
q <- 1 - p
q

Genotypic frequency

Genotypic frequency is given by \[ f(AA) = P = \frac{\text{No. of } AA \text{ individuals}}{\text{Total no. individuals}} \\ f(Aa) = H = \frac{\text{No. of } Aa \text{ individuals}}{\text{Total no. individuals}} \\ f(aa) = Q = \frac{\text{No. of } aa \text{ individuals}}{\text{Total no. individuals}}. \\ \]

Exercise 2

What are the genotypic frequencies of AA, Aa, and aa in the first SNP?

P <- (nAA)/(nAA + nAa + naa)
P
H <- (nAa)/(nAA + nAa + naa)
H
Q <- (naa)/(nAA + nAa + naa)
Q

Another approach for obtaining allele frequency

\[ f(A) = p = \frac{2 \times (\text{frequency of } AA) + 1 \times (\text{frequency of } Aa)}{2 \times (\text{frequency of } AA + Aa + aa)}. \]

Exercise 3

Use the variables P, H, and Q defined above and compute the allele frequencies of A and a in the first SNP.

p.ex3 <- (2 * P + 1 * H)/(2 * (P + H + Q))
p.ex3
q.ex3 <- 1 - p.ex3
q.ex3

Exercise 4

What are the genotypic frequencies of AA, Aa, and aa in the second SNP?

nAA <- table(W[, 2])[3]
nAa <- table(W[, 2])[2]
naa <- table(W[, 2])[1]

p <- (2 * nAA + 1 * nAa)/(2 * (nAA + nAa + naa))
p
q <- 1 - p
q

P <- (nAA)/(nAA + nAa + naa)
P
H <- (nAa)/(nAA + nAa + naa)
H
Q <- (naa)/(nAA + nAa + naa)
Q

Compute allele frequencies for all SNPs

So far we have learned how to compute the allele frequency of a single SNP using the table function. Here, we consider how to compute allele frequencies for the entire SNPs. Of course we can apply the table function manually one at a time. However, this approach takes too much time to compute allele frequencies for 6,960 SNPs. Recall that allele frequency of A is given by \[ f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}. \] We can rewrite this equation into \[ f(A) = p = \frac{(\text{no. of } A \text{ allele in the population})}{2 \times \text{total no. of individuals}}. \] This suggests that all we need is the number of \(A\) allele or reference allele \(a\) for each SNP. The sum function returns the number of reference allele \(A\).

sum(W[, 1])  # sum of A allele in the first SNP
sum(W[, 2])  # sum of A allele in the second SNP

How to repeat this operation for 300 SNPs? The colSums function returns the sum of each column in a matrix as a vector.

colSums(W)

Note that colSums(W) gives the numerator of the above equation. We then divide this number by \(2 \times \text{total no. of individuals}\). The function nrows returns the number of rows.

p <- colSums(W)/(2 * nrow(W))

The variable p is a vector and it contains the allele frequencies of reference allele for 6,960 SNPs.

Exercise 5

What is the allele frequency of reference allele in the 50th SNP?

p[50]

Minor allele frequency

In most cases, people report a minor allele frequency, which is the frequency of less frequent allele in a given SNP. We can convert allele frequencies into minor allele frquencies by using the ifelse function.

maf <- ifelse(p > 0.5, 1 - p, p)

Exercise 6

What is the minor allele frquency of reference allele in the 30th SNP?

maf[30]

Exercise 7

What is the mean of minor allele frquencies?

mean(maf)

Expectations and variances

Exercise 8

Compute the allele frequency of SNP markers. Recall that the expectation of genotype, \(E(W)\), is given by \(2p\), where \(p\) is the frequency of reference allele. Verify that \(2p\) is equal to the mean of each genotype obtained from the colMeans() function.

p3 <- (2 * colSums(W == 2) + 1 * colSums(W == 1))/(2 * nrow(W))
table(colMeans(W) == 2 * p3)

p <- colSums(W)/(2 * nrow(W))
table(colMeans(W) == 2 * p)

p2 <- colMeans(W)/2
table(p == p2)

Exercise 9

Recall that the variance of genotype, \(Var(W)\), is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var() function.

var1 <- 2 * p * (1 - p)
var2 <- diag(var(W))
var2 <- apply(W, 2, var)
table(var1 == var2)
head(var1)
head(var2)
all.equal(var1, var2, tol = 0.05)

Exercise 10

Create a new marker matrix X from W and recode markers so that three genotypes \(AA\), \(Aa\), and \(aa\) are coded as 1, 0, and -1, respectively.

X <- W - 1

Recall that the expectation of genotype, \(E(X)\), is given by \(2p-1\), where \(p\) is the frequency of reference allele. Verify that \(2p-1\) is equal to the mean of each genotype obtained from the colMeans() function.

p2 <- (2 * colSums(X == 1) + 1 * colSums(X == 0))/(2 * nrow(X))
table(p == p2)
all.equal(2 * p - 1, colMeans(X), tol = 0.005)

Exercise 12

Recall that the variance of genotype, \(Var(X)\), remains the same and is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the var() function.

var1 <- 2 * p * (1 - p)
var2 <- var(X)
var2 <- apply(X, 2, var)
all.equal(var1, var2, tol = 0.05)

Exercise 13

Verify that no matter how we code markers, centered marker codes, \(W - E(W)\) and \(X - E(X)\), remain the same.

Exercise 14

We will recode the SNP genotypes so that now the major allele is treated as a reference allele. Store the new coding into the W2 variable.

W2 <- W
W2[W2 == 0] <- 3
W2[W2 == 2] <- 0
W2[W2 == 3] <- 2

Compute the allele freqeuncy of SNP markers using W2. Compare your result with the allele frequency obtained from W.

# Freq of A (W)
p <- colSums(W)/(2 * nrow(W))
# Freq of A (W2)
pW2 <- colSums(W2)/(2 * nrow(W2))

head(p)
head(pW2)


# SNP 1
table(W[, 1])
table(W2[, 1])

# SNP 16
table(W[, 16])
table(W2[, 16])

Gota Morota

November 12, 2018