Ordinary least squares and BLUE
Overview
We will learn how to connect phenotypes and systematic effects (fixed effects) using ordinary least squares (OLS).
Read file
Use the function read.table()
to read the phenotype file (1000withEffects.redangus).
dat <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)
dim(dat)
Phenotypes
The hist()
function plot a histogram.
hist(dat$BWT, col = "lightblue", main = "Histogram", xlab = "Birth weights")
The argument prob=TRUE
returns a probability on the y axis. We use the lines()
function to add a density estimate along with the histogram.
hist(dat$BWT, col = "lightblue", main = "Histogram", prob = TRUE, ylim = c(0,
0.04), xlab = "Birth weights")
lines(density(dat$BWT), col = "red")
Exercise 1
What are the mean, median, standard deviation, maximum, and minimum of birth weights?
Systematic effects
We will estimate the effects of age of dams and sex of calves on birth weights. Let’s first check the distributions of the systematic effects.
Age of dams
The hist()
functions plot a histogram.
hist(dat$AgeDam.mon., col = "lightblue", main = "Histogram", xlab = "Age of dams")
Below is a histogram with the argumment prob=TRUE
.
hist(dat$AgeDam.mon., col = "lightblue", main = "Histogram", prob = TRUE, xlab = "Age of dams")
lines(density(dat$AgeDam.mon), col = "red")
Exercise 2
What are the mean, median, standard deviation, maximum, and minimum of age of dams?
Sex of calves
The barplot()
function generates a barplot.
barplot(table(dat$SEX), main = "Distribution", ylab = "Frequency", names.arg = c("Bulls",
"Cows", "Steers"), col = "lightblue")
Because there is only one steer, we treat him as a bull.
table(dat$SEX)
dat$SEX[dat$SEX == "S"] <- "B"
table(dat$SEX)
Birth weights and age of dams
Here we visualize the relationship between age of dams and birth weights.
plot(dat$AgeDam.mon., dat$BWT, xlab = "BWT", ylab = "Age of dams")
abline(lm(dat$BWT ~ dat$AgeDam.mon.), lwd = 3, col = "red")
lines(lowess(dat$AgeDam.mon., dat$BWT), lwd = 4, col = "blue")
Exercise 3
What is the correlation between age of dams and birth weights?
Birth weights and sex of calves
The boxplot()
function generates distributions of birth weights for bulls and cows.
boxplot(dat$BWT[dat$SEX == "B"], dat$BWT[dat$SEX == "C"], names = c("Bulls",
"Cows"), ylab = "Birth weights")
Exercise 4
What is the mean difference between birth weights of bulls and cows?
Ordinary least squares
We will use the lm()
function to fit OLS. This function offers an intuitive formula syntax. The summary()
function summarizes a model fit.
fit <- lm(BWT ~ AgeDam.mon. + factor(SEX), data = dat)
summary(fit)
Save R objects
Use the save()
function to save variables as R objects.
save(dat, file = "dat_day17.Rda")
save(fit, file = "fit_day17.Rda")
Exercise 5
Comfirm the estimates of p-values from t-statistics by using the pt
function. The pt()
function gives the probability density function.
Exercise 6
Explain why the degrees of freedom is 997.
Exercise 7
Comfirm the residual standard error by using the predict()
function. This function returns \(\hat{\mathbf{y}}\). Note that the residual standard error is given by \[
\begin{align*}
\hat{\sigma}^2_e &= \frac{\sum_{i=1}^{n} (y_i - \sum_{j=1}^{m} x_{ij}\hat{b}_{j})^2 }{n - k - 1} \\
&= \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }{n - k - 1}. \\
\end{align*}
\]
Exercise 8
Compute the multiple R-squared (\(R^2\)) by yourself. The multiple R-squared indicates the proportion of explained variance. It is given by the squared correlation between the observed and predicted values. \[ R^2 = cor(\mathbf{y}, \hat{\mathbf{y}})^2 \]
Exercise 9
Compute the adjusted multiple R-squared by yourself. The adjusted multiple R-squared is given by \[ R^2_{adj} = 1 - (1 - R^2) \left(\frac{n-1}{n - k - 1} \right). \]