Repeatability and Heritability

Overview

We will learn how to compute repeatability and heritability using an analysis of variance (ANOVA) in R.

Reading file

Use the function read_excel in the readxl package to read the pedigree file (Simdata.xlsx) in a data frame format.

library(readxl)
dat <- read_excel(file.choose())
dim(dat)
head(dat)

Pre-adjustment of phenotypes

We will calculate repeatability by considering calf body weights (BW) at 205 day as repeated records on the sire. The ANOVA discussed in class is called one-way ANOVA, which assumes that there are no other factors or covariates influencing repeated records. This is not true for the beef cattle dataset because Sex and AgeDamCat are known to contribute to BW. Pre-correction of BW is therefore required before applying the ANOVA. One way to achieve this is to fit ordinary least squares (OLS). Below we will learn how to connect phenotypes and systematic effects (fixed effects) using OLS.

Phenotypes

The hist() function plots a histogram.

hist(dat$BW_205d, col = "lightblue", main = "Histogram", xlab = "BW")

The argument prob=TRUE returns a probability on the y axis. We use the lines() function to add a density estimate along with the histogram.

hist(dat$BW_205d, col = "lightblue", main = "Histogram", prob = TRUE, ylim = c(0, 
    0.01), xlab = "BW")
lines(density(dat$BW_205d), col = "red")

Exercise 1

What are the mean, median, standard deviation, maximum, and minimum of BW?

Systematic effects

We will estimate the effects of dam age categories and sex of calves on BW. Let’s first check the distributions of the systematic effects.

Age of dams

The hist() functions plots a histogram.

hist(dat$AgeDamCat, col = "lightblue", main = "Histogram", xlab = "Age of dams")

Below is a histogram with the argumment prob=TRUE.

hist(dat$AgeDamCat, col = "lightblue", main = "Histogram", prob = TRUE, xlab = "Age of dams")
lines(density(dat$AgeDamCat), col = "red")

Exercise 2

What are the mean, median, standard deviation, maximum, and minimum of dam age categories?

Sex of calves

The barplot() function generates a barplot.

barplot(table(dat$Sex), main = "Distribution", ylab = "Frequency", names.arg = c("Female", 
    "Male"), col = "lightblue")

BW and age of dams

Here we visualize the relationship between age of dams and BW.

plot(dat$AgeDamCat, dat$BW_205d, xlab = "Age of dams", ylab = "BW_205d")
abline(lm(dat$BW_205d ~ dat$AgeDamCat), lwd = 3, col = "red")
lines(lowess(dat$AgeDamCat, dat$BW_205d), lwd = 4, col = "blue")

Exercise 3

What is the correlation between dam age categories and BW?

BW and sex of calves

The boxplot() function generates distributions of BW for bulls and cows.

boxplot(dat$BW_205d[dat$Sex == "M"], dat$BW_205d[dat$Sex == "F"], names = c("Male", 
    "Female"), ylab = "BW")

Exercise 4

What is the mean difference between BW of bulls and cows?

Ordinary least squares

We will use the lm() function to fit OLS and adjust for covariates. This function offers an intuitive formula syntax. The summary() function summarizes a model fit.

fit <- lm(BW_205d ~ factor(AgeDamCat) + factor(Sex), data = dat)
summary(fit)
residuals(fit)

Exercise 5

Check whether the distributions of BW for bulls and cows are similar in residuals(fit). Use the boxplot() function.

BW and age of dams

Visualize the relationship between age of dams and BW after the correction.

plot(dat$AgeDamCat, residuals(fit), xlab = "Age of dams", ylab = "BW_205d")
abline(lm(residuals(fit) ~ dat$AgeDamCat), lwd = 3, col = "red")
lines(lowess(dat$AgeDamCat, residuals(fit)), lwd = 4, col = "blue")

Number of records per sire

Let’s investigate how many records each sire has. The table() functions returns a frequency table and the barplot() function generates a barplot.

table(dat$Sire)
barplot(table(dat$Sire), xlab = "Sire IDs")

Exercise 6

Count the number of sires that has only one record. Again use the table() and barplot() functions.

Unbalanced data

The ANOVA we discussed in class is called balanced ANOVA because the number of records per bird was fixed with n = 10. Unlike such an experimental design, data from observatinal studies are hihgly unbalanced as in the case of our simulated beef cattle dataset. Because the number of records per sire varies, we will calculate an adjusted n or k1 according to Becker (1975). This is given by \[ \begin{align*} k_1 &= \frac{1}{s-1} \times (N - \frac{\sum_{i}^{s}n_i^2}{N}) \end{align*}, \] where \(s\) is the number of sires, \(N\) is the total number of observations, and \(n_i\) is the number of records from \(i\)th sire. The following R code computes \(s\), \(N\), and \(\sum_{i}^{s}n_i^2\).

N <- nrow(dat)
N
s <- length(table(dat$Sire))
s
ni2_sum <- sum(table(dat$Sire)^2)
ni2_sum

Exercise 7

Compute k1 which is the adjusted number of records per sire.

Exercise 8

Compare k1 with the mean of number of records per sire.

ANOVA

We will fit the ANOVA to the simulated beef cattle dataset. The aov() function performs ANOVA. This function takes two arguments: a vector of observations and a vector of group variables.

`?`(aov)
my.aov <- aov(residuals(fit) ~ factor(dat$Sire))
my.aov

The summary() functions returns an ANOVA table.

aov.summary <- summary(my.aov)
aov.summary

Because the object aov.summary is a list, we can use the double square brackets followed by the single square brackets to access each number in the ANOVA table.

str(aov.summary)
aov.summary[[1]]
MSB <- aov.summary[[1]][1, 3]
MSB
MSW <- aov.summary[[1]][2, 3]
MSW

Exercise 9

Recall that repeatability is the corrleation between repeated records on an individual. This is given by \[ \begin{align*} r &= \frac{V(B)}{V(B) + V(W)} \end{align*}, \] where \(V(B) = \frac{MS_B - MS_W}{k_1}\) and \(V(W) = MS_W\). We can now compute repeatability while controlling for covariates. Report the estimate of repeatability.

Exercise 10

Recall that the heritability is the percentage of phenotypic difference in the parent observed in the offspring. Report the estimate of heritability using the sire model.

References

  • Becker WA. Manual of Quantitative Genetics. 1975. Program in Genetic, Washington State University, Pullman, Washington. [Amazon] [Book review]

Gota Morota

February 20, 2018