Repeatability and Heritability


We will learn how to compute repeatability and heritability using an analysis of variance (ANOVA) in R.

Reading file

Use the function read.table to read the phenotype file (1000withEffects.redangus) in a data frame format.

dat <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)

Pre-adjustment of phenotypes

We will calculate repeatability by considering calf birth weights as repeated records on the sire. The ANOVA discussed in class is called one-way ANOVA, which assumes that there are no other factors or covariates influencing repeated records. This is not true for the beef cattle dataset because SEX and AgeDam.mon. are known to contribute to birth weights. Pre-correction of birth weights is therefore required before applying the ANOVA.

Adjustment of SEX effect

To pre-correct the effect of SEX, we will first calculate the difference between means of bulls and cows and then add back the mean difference to cows’ birth weight records. The table() function returns the number of bulls and cows.


Let’s compute the mean of cow birth weights first. We will begin with obtaining a vector of indices. The indexC indicates which animals are cows.

indexC <- which(dat$SEX == "C")
table(dat[indexC, "SEX"])

We use the vector of indices indexC to compute the mean of cow birth weights.

BWT_C <- dat$BWT[indexC]
muC <- mean(BWT_C)

We apply the same code for bulls. The indexB tells us which animals are bulls.

indexB <- which(dat$SEX != "C")
table(dat[indexB, "SEX"])

Because we have 1000 animals in total, the sum of length of indexC and length of indexB is 1000.

length(indexC) + length(indexB)

Similarly, we use the vector of indices indexB to compute the mean of bull birth weights.

BWT_B <- dat$BWT[indexB]
muB <- mean(BWT_B)

We can visualize the means and distributions of birth weights by using a boxplot.

boxplot(BWT_B, BWT_C, names = c("Bulls", "Cows"))

The mean difference between bull and cow birth weights is the difference between muB and muC.

adj <- muB - muC

We then copy the data frame dat to a new data frame dat2 and add adj to the cow birth weights.

dat2 <- dat
dat2$BWT[indexC] <- dat2$BWT[indexC] + adj

Exercise 1

Check whether the means of birth weights for cows and bulls are the same in dat2.

Exercise 2

Plot a boxplot of cow and bull birth weights in dat2.

Adjustment of AgeDam.mon. effect

We will account for the age of dam effect by restricting the data to 48 months and older. We will use the subset() function to subset the dat2 and create a new data frame dat3.

dat3 <- subset(dat2, AgeDam.mon. >= 48)

Number of records per sire

Let’s investigate how many records each sire has. The table() functions returns a frequency table and the barplot() function generates a barplot.

barplot(table(dat3$SIRE), xlab = "Sire IDs")

Now we count the number of sires that has only one record. Again we will use the table() and barplot() functions.

barplot(table(table(dat3$SIRE)), xlab = "Number of records", ylab = "Number of sires")

We will remove the sires with one record because they do not add any information. We will do this first by finding a vector of sire indices with the number of records greater or equal to two.

num.rec <- table(dat3$SIRE)

index3 <- which(num.rec != 1)

Exercise 3

Show that the difference between the lengths of num.rec and index3 is equal to the number of sires with only one record.

We will now use the vector index3 to find sires with the number of records greater or equal to two. The sire object contains IDs of those sires.

sire <- names(num.rec[index3])

Then we only select records provided by those sires by using the %in% operator. We will use the data frame dat4 to compute repeatability and heritability.

index4 <- dat3$SIRE %in% sire
dat4 <- dat3[index4, ]

Unbalanced data

The ANOVA we discussed in class is called balanced ANOVA because the number of records per turkey was fixed with n = 10. Unlike such an experimental design, data from observatinal studies are hihgly unbalanced as in the case of our beef cattle dataset. Because the number of records per sire varies, we will calculate an adjusted n or k1 according to Becker (1975). This is given by \[ \begin{align*} k_1 &= \frac{1}{s} \times (N - \frac{\sum_{i}^{s}n_i^2}{N}) \end{align*}, \] where \(s\) is the number of sires, \(N\) is the total number of observations, and \(n_i\) is the number of records from \(i\)th sire. The following R code computes \(s\), \(N\), and \(\sum_{i}^{s}n_i^2\).

N <- nrow(dat4)
s <- length(table(dat4$SIRE))
ni2_sum <- sum(table(dat4$SIRE)^2)

Exercise 4

Compute k1 which is the adjusted number of records per sire.

Exercise 5

Compare k1 with the mean of number of records per sire.


We will fit the ANOVA to the beef cattle dataset. The aov() function performs ANOVA. This function takes two arguments: a vector of observations and a vector of group variables.

my.aov <- aov(dat4$BWT ~ factor(dat4$SIRE))

The summary() functions returns an ANOVA table.

aov.summary <- summary(my.aov)

Because the object aov.summary is a list, we can use the double square brackets followed by the single square brackets to access each number in the ANOVA table.

MSB <- aov.summary[[1]][1, 3]
MSW <- aov.summary[[1]][2, 3]

Exercise 6

Recall that repeatability is the corrleation between repeated records on an individual. This is given by \[ \begin{align*} r &= \frac{V(B)}{V(B) + V(W)} \end{align*}, \] where \(V(B) = \frac{MS_B - MS_W}{k_1}\) and \(V(W) = MS_W\). We can now compute repeatability while controlling for SEX and AgeDam.mon. Report the estimate of repeatability.

Exercise 7

Recall that the heritability is the percentage of phenotypic difference in the parent observed in the offspring. Report the estimate of heritability using a sire model.


  • Becker WA. Manual of Quantitative Genetics. 1975. Program in Genetic, Washington State University, Pullman, Washington. [Amazon] [Book review]

Gota Morota

February 09, 2017