# Repeatability and Heritability

# Overview

We will learn how to compute repeatability and heritability using an analysis of variance (ANOVA) in R.

# Reading file

Use the function `read.table`

to read the phenotype file (1000withEffects.redangus) in a data frame format.

```
dat <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)
head(dat)
```

# Pre-adjustment of phenotypes

We will calculate repeatability by considering calf birth weights as repeated records on the sire. The ANOVA discussed in class is called one-way ANOVA, which assumes that there are no other factors or covariates influencing repeated records. This is not true for the beef cattle dataset because `SEX`

and `AgeDam.mon.`

are known to contribute to birth weights. Pre-correction of birth weights is therefore required before applying the ANOVA.

## Adjustment of `SEX`

effect

To pre-correct the effect of `SEX`

, we will first calculate the difference between means of bulls and cows and then add back the mean difference to cows’ birth weight records. The `table()`

function returns the number of bulls and cows.

`table(dat$SEX)`

Let’s compute the mean of cow birth weights first. We will begin with obtaining a vector of indices. The `indexC`

indicates which animals are cows.

```
indexC <- which(dat$SEX == "C")
head(indexC)
length(indexC)
table(dat[indexC, "SEX"])
```

We use the vector of indices `indexC`

to compute the mean of cow birth weights.

```
BWT_C <- dat$BWT[indexC]
muC <- mean(BWT_C)
muC
```

We apply the same code for bulls. The `indexB`

tells us which animals are bulls.

```
indexB <- which(dat$SEX != "C")
head(indexB)
length(indexB)
table(dat[indexB, "SEX"])
```

Because we have 1000 animals in total, the sum of length of `indexC`

and length of `indexB`

is 1000.

`length(indexC) + length(indexB)`

Similarly, we use the vector of indices `indexB`

to compute the mean of bull birth weights.

```
BWT_B <- dat$BWT[indexB]
muB <- mean(BWT_B)
muB
```

We can visualize the means and distributions of birth weights by using a boxplot.

`boxplot(BWT_B, BWT_C, names = c("Bulls", "Cows"))`

The mean difference between bull and cow birth weights is the difference between `muB`

and `muC`

.

```
adj <- muB - muC
adj
```

We then copy the data frame `dat`

to a new data frame `dat2`

and add `adj`

to the cow birth weights.

```
dat2 <- dat
dat2$BWT[indexC] <- dat2$BWT[indexC] + adj
```

### Exercise 1

Check whether the means of birth weights for cows and bulls are the same in `dat2`

.

### Exercise 2

Plot a boxplot of cow and bull birth weights in `dat2`

.

## Adjustment of `AgeDam.mon.`

effect

We will account for the age of dam effect by restricting the data to 48 months and older. We will use the `subset()`

function to subset the `dat2`

and create a new data frame `dat3`

.

```
dat3 <- subset(dat2, AgeDam.mon. >= 48)
dim(dat3)
table(dat3$SEX)
table(dat3$AgeDam.mon.)
```

# Number of records per sire

Let’s investigate how many records each sire has. The `table()`

functions returns a frequency table and the `barplot()`

function generates a barplot.

```
table(dat3$SIRE)
barplot(table(dat3$SIRE), xlab = "Sire IDs")
```

Now we count the number of sires that has only one record. Again we will use the `table()`

and `barplot()`

functions.

```
table(table(dat3$SIRE))
barplot(table(table(dat3$SIRE)), xlab = "Number of records", ylab = "Number of sires")
```

We will remove the sires with one record because they do not add any information. We will do this first by finding a vector of sire indices with the number of records greater or equal to two.

```
num.rec <- table(dat3$SIRE)
head(num.rec)
length(num.rec)
index3 <- which(num.rec != 1)
head(index3)
length(index3)
```

### Exercise 3

Show that the difference between the lengths of `num.rec`

and `index3`

is equal to the number of sires with only one record.

We will now use the vector `index3`

to find sires with the number of records greater or equal to two. The `sire`

object contains IDs of those sires.

```
sire <- names(num.rec[index3])
head(sire)
```

Then we only select records provided by those sires by using the `%in%`

operator. We will use the data frame `dat4`

to compute repeatability and heritability.

```
index4 <- dat3$SIRE %in% sire
head(index4)
dat4 <- dat3[index4, ]
dim(dat4)
```

# Unbalanced data

The ANOVA we discussed in class is called balanced ANOVA because the number of records per turkey was fixed with `n`

= 10. Unlike such an experimental design, data from observatinal studies are hihgly unbalanced as in the case of our beef cattle dataset. Because the number of records per sire varies, we will calculate an adjusted `n`

or `k1`

according to Becker (1975). This is given by \[
\begin{align*}
k_1 &= \frac{1}{s} \times (N - \frac{\sum_{i}^{s}n_i^2}{N})
\end{align*},
\] where \(s\) is the number of sires, \(N\) is the total number of observations, and \(n_i\) is the number of records from \(i\)th sire. The following R code computes \(s\), \(N\), and \(\sum_{i}^{s}n_i^2\).

```
N <- nrow(dat4)
N
s <- length(table(dat4$SIRE))
s
ni2_sum <- sum(table(dat4$SIRE)^2)
ni2_sum
```

### Exercise 4

Compute `k1`

which is the adjusted number of records per sire.

### Exercise 5

Compare `k1`

with the mean of number of records per sire.

# ANOVA

We will fit the ANOVA to the beef cattle dataset. The `aov()`

function performs ANOVA. This function takes two arguments: a vector of observations and a vector of group variables.

```
`?`(aov)
my.aov <- aov(dat4$BWT ~ factor(dat4$SIRE))
my.aov
```

The `summary()`

functions returns an ANOVA table.

```
aov.summary <- summary(my.aov)
aov.summary
```

Because the object `aov.summary`

is a list, we can use the double square brackets followed by the single square brackets to access each number in the ANOVA table.

```
str(aov.summary)
aov.summary[[1]]
MSB <- aov.summary[[1]][1, 3]
MSB
MSW <- aov.summary[[1]][2, 3]
MSW
```

### Exercise 6

Recall that repeatability is the corrleation between repeated records on an individual. This is given by \[
\begin{align*}
r &= \frac{V(B)}{V(B) + V(W)}
\end{align*},
\] where \(V(B) = \frac{MS_B - MS_W}{k_1}\) and \(V(W) = MS_W\). We can now compute repeatability while controlling for `SEX`

and `AgeDam.mon`

. Report the estimate of repeatability.

### Exercise 7

Recall that the heritability is the percentage of phenotypic difference in the parent observed in the offspring. Report the estimate of heritability using a sire model.

# References

- Becker WA. Manual of Quantitative Genetics. 1975. Program in Genetic, Washington State University, Pullman, Washington. [Amazon] [Book review]