# Introduction to Quantitative Genetics

# Statistical Methods for Omics-assisted Breeding Hands-on Workshop

## Data

We are going to use the `mice`

data included in the BGLR package.

```
library(BGLR)
data(mice)
W <- mice.X[, 1:100]
```

We will learn how to compute allele and genotypic frequencies in R using the cattle data set.

Let’s compute the allele frequency of the first SNP. The `table`

function returns frequncies of genotypes.

`table(W[, 1])`

We can see that there are 541 *AA* animals, 929 *Aa* animals, and 344 *aa* animals. Let’s assign these numbers into variables.

```
nAA <- table(W[, 1])[3]
nAa <- table(W[, 1])[2]
naa <- table(W[, 1])[1]
```

Allele frequency of *A* is given by \[
f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}.
\]

### Exercise 1

Use the variables `nAA`

, `nAa`

, and `naa`

defined above and compute the allele frequencies of *A* and *a* in the first SNP.

```
p <- (2 * nAA + 1 * nAa)/(2 * (nAA + nAa + naa))
p
q <- 1 - p
q
```

## Genotypic frequency

Genotypic frequency is given by \[ f(AA) = P = \frac{\text{No. of } AA \text{ individuals}}{\text{Total no. individuals}} \\ f(Aa) = H = \frac{\text{No. of } Aa \text{ individuals}}{\text{Total no. individuals}} \\ f(aa) = Q = \frac{\text{No. of } aa \text{ individuals}}{\text{Total no. individuals}}. \\ \]

### Exercise 2

What are the genotypic frequencies of `AA`

, `Aa`

, and `aa`

in the first SNP?

```
P <- (nAA)/(nAA + nAa + naa)
P
H <- (nAa)/(nAA + nAa + naa)
H
Q <- (naa)/(nAA + nAa + naa)
Q
```

## Another approach for obtaining allele frequency

\[ f(A) = p = \frac{2 \times (\text{frequency of } AA) + 1 \times (\text{frequency of } Aa)}{2 \times (\text{frequency of } AA + Aa + aa)}. \]

### Exercise 3

Use the variables `P`

, `H`

, and `Q`

defined above and compute the allele frequencies of *A* and *a* in the first SNP.

```
p.ex3 <- (2 * P + 1 * H)/(2 * (P + H + Q))
p.ex3
q.ex3 <- 1 - p.ex3
q.ex3
```

### Exercise 4

What are the genotypic frequencies of `AA`

, `Aa`

, and `aa`

in the second SNP?

```
nAA <- table(W[, 2])[3]
nAa <- table(W[, 2])[2]
naa <- table(W[, 2])[1]
p <- (2 * nAA + 1 * nAa)/(2 * (nAA + nAa + naa))
p
q <- 1 - p
q
P <- (nAA)/(nAA + nAa + naa)
P
H <- (nAa)/(nAA + nAa + naa)
H
Q <- (naa)/(nAA + nAa + naa)
Q
```

## Compute allele frequencies for all SNPs

So far we have learned how to compute the allele frequency of a single SNP using the `table`

function. Here, we consider how to compute allele frequencies for the entire SNPs. Of course we can apply the `table`

function manually one at a time. However, this approach takes too much time to compute allele frequencies for 6,960 SNPs. Recall that allele frequency of *A* is given by \[
f(A) = p = \frac{2 \times (\text{no. of } AA \text{ individuals}) + 1 \times (\text{no. of } Aa \text{ individuals})}{2 \times \text{total no. of individuals}}.
\] We can rewrite this equation into \[
f(A) = p = \frac{(\text{no. of } A \text{ allele in the population})}{2 \times \text{total no. of individuals}}.
\] This suggests that all we need is the number of \(A\) allele or reference allele \(a\) for each SNP. The `sum`

function returns the number of reference allele \(A\).

```
sum(W[, 1]) # sum of A allele in the first SNP
sum(W[, 2]) # sum of A allele in the second SNP
```

How to repeat this operation for 300 SNPs? The `colSums`

function returns the sum of each column in a matrix as a vector.

`colSums(W)`

Note that `colSums(W)`

gives the numerator of the above equation. We then divide this number by \(2 \times \text{total no. of individuals}\). The function `nrows`

returns the number of rows.

`p <- colSums(W)/(2 * nrow(W))`

The variable `p`

is a vector and it contains the allele frequencies of reference allele for 6,960 SNPs.

### Exercise 5

What is the allele frequency of reference allele in the 50th SNP?

`p[50]`

## Minor allele frequency

In most cases, people report a minor allele frequency, which is the frequency of less frequent allele in a given SNP. We can convert allele frequencies into minor allele frquencies by using the `ifelse`

function.

`maf <- ifelse(p > 0.5, 1 - p, p)`

### Exercise 6

What is the minor allele frquency of reference allele in the 30th SNP?

`maf[30]`

### Exercise 7

What is the mean of minor allele frquencies?

`mean(maf)`

## Expectations and variances

### Exercise 8

Compute the allele frequency of SNP markers. Recall that the expectation of genotype, \(E(W)\), is given by \(2p\), where \(p\) is the frequency of reference allele. Verify that \(2p\) is equal to the mean of each genotype obtained from the `colMeans()`

function.

```
p3 <- (2 * colSums(W == 2) + 1 * colSums(W == 1))/(2 * nrow(W))
table(colMeans(W) == 2 * p3)
p <- colSums(W)/(2 * nrow(W))
table(colMeans(W) == 2 * p)
p2 <- colMeans(W)/2
table(p == p2)
```

### Exercise 9

Recall that the variance of genotype, \(Var(W)\), is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the `var()`

function.

```
var1 <- 2 * p * (1 - p)
var2 <- diag(var(W))
var2 <- apply(W, 2, var)
table(var1 == var2)
head(var1)
head(var2)
all.equal(var1, var2, tol = 0.05)
```

### Exercise 10

Create a new marker matrix `X`

from `W`

and recode markers so that three genotypes \(AA\), \(Aa\), and \(aa\) are coded as 1, 0, and -1, respectively.

`X <- W - 1`

Recall that the expectation of genotype, \(E(X)\), is given by \(2p-1\), where \(p\) is the frequency of reference allele. Verify that \(2p-1\) is equal to the mean of each genotype obtained from the `colMeans()`

function.

```
p2 <- (2 * colSums(X == 1) + 1 * colSums(X == 0))/(2 * nrow(X))
table(p == p2)
all.equal(2 * p - 1, colMeans(X), tol = 0.005)
```

### Exercise 12

Recall that the variance of genotype, \(Var(X)\), remains the same and is given by \(2p(1-p)\). Verify that \(2p(1-p)\) is close to the variance of each genotype obtained from the `var()`

function.

```
var1 <- 2 * p * (1 - p)
var2 <- var(X)
var2 <- apply(X, 2, var)
all.equal(var1, var2, tol = 0.05)
```

### Exercise 13

Verify that no matter how we code markers, centered marker codes, \(W - E(W)\) and \(X - E(X)\), remain the same.

### Exercise 14

We will recode the SNP genotypes so that now the major allele is treated as a reference allele. Store the new coding into the `W2`

variable.

```
W2 <- W
W2[W2 == 0] <- 3
W2[W2 == 2] <- 0
W2[W2 == 3] <- 2
```

Compute the allele freqeuncy of SNP markers using `W2`

. Compare your result with the allele frequency obtained from `W`

.

```
# Freq of A (W)
p <- colSums(W)/(2 * nrow(W))
# Freq of A (W2)
pW2 <- colSums(W2)/(2 * nrow(W2))
head(p)
head(pW2)
# SNP 1
table(W[, 1])
table(W2[, 1])
# SNP 16
table(W[, 16])
table(W2[, 16])
```