Repeatability and Heritability
Overview
We will learn how to compute repeatability and heritability using an analysis of variance (ANOVA) in R.
Reading file
Use the function read.table
to read the phenotype file (1000withEffects.redangus) in a data frame format.
dat <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)
head(dat)
Pre-adjustment of phenotypes
We will calculate repeatability by considering calf birth weights as repeated records on the sire. The ANOVA discussed in class is called one-way ANOVA, which assumes that there are no other factors or covariates influencing repeated records. This is not true for the beef cattle dataset because SEX
and AgeDam.mon.
are known to contribute to birth weights. Pre-correction of birth weights is therefore required before applying the ANOVA.
Adjustment of SEX
effect
To pre-correct the effect of SEX
, we will first calculate the difference between means of bulls and cows and then add back the mean difference to cows’ birth weight records. The table()
function returns the number of bulls and cows.
table(dat$SEX)
Let’s compute the mean of cow birth weights first. We will begin with obtaining a vector of indices. The indexC
indicates which animals are cows.
indexC <- which(dat$SEX == "C")
head(indexC)
length(indexC)
table(dat[indexC, "SEX"])
We use the vector of indices indexC
to compute the mean of cow birth weights.
BWT_C <- dat$BWT[indexC]
muC <- mean(BWT_C)
muC
We apply the same code for bulls. The indexB
tells us which animals are bulls.
indexB <- which(dat$SEX != "C")
head(indexB)
length(indexB)
table(dat[indexB, "SEX"])
Because we have 1000 animals in total, the sum of length of indexC
and length of indexB
is 1000.
length(indexC) + length(indexB)
Similarly, we use the vector of indices indexB
to compute the mean of bull birth weights.
BWT_B <- dat$BWT[indexB]
muB <- mean(BWT_B)
muB
We can visualize the means and distributions of birth weights by using a boxplot.
boxplot(BWT_B, BWT_C, names = c("Bulls", "Cows"))
The mean difference between bull and cow birth weights is the difference between muB
and muC
.
adj <- muB - muC
adj
We then copy the data frame dat
to a new data frame dat2
and add adj
to the cow birth weights.
dat2 <- dat
dat2$BWT[indexC] <- dat2$BWT[indexC] + adj
Exercise 1
Check whether the means of birth weights for cows and bulls are the same in dat2
.
Exercise 2
Plot a boxplot of cow and bull birth weights in dat2
.
Adjustment of AgeDam.mon.
effect
We will account for the age of dam effect by restricting the data to 48 months and older. We will use the subset()
function to subset the dat2
and create a new data frame dat3
.
dat3 <- subset(dat2, AgeDam.mon. >= 48)
dim(dat3)
table(dat3$SEX)
table(dat3$AgeDam.mon.)
Number of records per sire
Let’s investigate how many records each sire has. The table()
functions returns a frequency table and the barplot()
function generates a barplot.
table(dat3$SIRE)
barplot(table(dat3$SIRE), xlab = "Sire IDs")
Now we count the number of sires that has only one record. Again we will use the table()
and barplot()
functions.
table(table(dat3$SIRE))
barplot(table(table(dat3$SIRE)), xlab = "Number of records", ylab = "Number of sires")
We will remove the sires with one record because they do not add any information. We will do this first by finding a vector of sire indices with the number of records greater or equal to two.
num.rec <- table(dat3$SIRE)
head(num.rec)
length(num.rec)
index3 <- which(num.rec != 1)
head(index3)
length(index3)
Exercise 3
Show that the difference between the lengths of num.rec
and index3
is equal to the number of sires with only one record.
We will now use the vector index3
to find sires with the number of records greater or equal to two. The sire
object contains IDs of those sires.
sire <- names(num.rec[index3])
head(sire)
Then we only select records provided by those sires by using the %in%
operator. We will use the data frame dat4
to compute repeatability and heritability.
index4 <- dat3$SIRE %in% sire
head(index4)
dat4 <- dat3[index4, ]
dim(dat4)
Unbalanced data
The ANOVA we discussed in class is called balanced ANOVA because the number of records per turkey was fixed with n
= 10. Unlike such an experimental design, data from observatinal studies are hihgly unbalanced as in the case of our beef cattle dataset. Because the number of records per sire varies, we will calculate an adjusted n
or k1
according to Becker (1975). This is given by \[
\begin{align*}
k_1 &= \frac{1}{s} \times (N - \frac{\sum_{i}^{s}n_i^2}{N})
\end{align*},
\] where \(s\) is the number of sires, \(N\) is the total number of observations, and \(n_i\) is the number of records from \(i\)th sire. The following R code computes \(s\), \(N\), and \(\sum_{i}^{s}n_i^2\).
N <- nrow(dat4)
N
s <- length(table(dat4$SIRE))
s
ni2_sum <- sum(table(dat4$SIRE)^2)
ni2_sum
Exercise 4
Compute k1
which is the adjusted number of records per sire.
Exercise 5
Compare k1
with the mean of number of records per sire.
ANOVA
We will fit the ANOVA to the beef cattle dataset. The aov()
function performs ANOVA. This function takes two arguments: a vector of observations and a vector of group variables.
`?`(aov)
my.aov <- aov(dat4$BWT ~ factor(dat4$SIRE))
my.aov
The summary()
functions returns an ANOVA table.
aov.summary <- summary(my.aov)
aov.summary
Because the object aov.summary
is a list, we can use the double square brackets followed by the single square brackets to access each number in the ANOVA table.
str(aov.summary)
aov.summary[[1]]
MSB <- aov.summary[[1]][1, 3]
MSB
MSW <- aov.summary[[1]][2, 3]
MSW
Exercise 6
Recall that repeatability is the corrleation between repeated records on an individual. This is given by \[
\begin{align*}
r &= \frac{V(B)}{V(B) + V(W)}
\end{align*},
\] where \(V(B) = \frac{MS_B - MS_W}{k_1}\) and \(V(W) = MS_W\). We can now compute repeatability while controlling for SEX
and AgeDam.mon
. Report the estimate of repeatability.
Exercise 7
Recall that the heritability is the percentage of phenotypic difference in the parent observed in the offspring. Report the estimate of heritability using a sire model.
References
- Becker WA. Manual of Quantitative Genetics. 1975. Program in Genetic, Washington State University, Pullman, Washington. [Amazon] [Book review]