# Overview

We will learn how to connect phenotypes and systematic effects (fixed effects) using ordinary least squares (OLS).

Use the function read.table() to read the phenotype file (1000withEffects.redangus).

dat <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)
dim(dat)

# Phenotypes

The hist() function plot a histogram.

hist(dat$BWT, col = "lightblue", main = "Histogram", xlab = "Birth weights") The argument prob=TRUE returns a probability on the y axis. We use the lines() function to add a density estimate along with the histogram. hist(dat$BWT, col = "lightblue", main = "Histogram", prob = TRUE, ylim = c(0,
0.04), xlab = "Birth weights")
lines(density(dat$BWT), col = "red") ### Exercise 1 What are the mean, median, standard deviation, maximum, and minimum of birth weights? # Systematic effects We will estimate the effects of age of dams and sex of calves on birth weights. Letâ€™s first check the distributions of the systematic effects. ## Age of dams The hist() functions plot a histogram. hist(dat$AgeDam.mon., col = "lightblue", main = "Histogram", xlab = "Age of dams")

Below is a histogram with the argumment prob=TRUE.

hist(dat$AgeDam.mon., col = "lightblue", main = "Histogram", prob = TRUE, xlab = "Age of dams") lines(density(dat$AgeDam.mon), col = "red")

### Exercise 2

What are the mean, median, standard deviation, maximum, and minimum of age of dams?

## Sex of calves

The barplot() function generates a barplot.

barplot(table(dat$SEX), main = "Distribution", ylab = "Frequency", names.arg = c("Bulls", "Cows", "Steers"), col = "lightblue") Because there is only one steer, we treat him as a bull. table(dat$SEX)
dat$SEX[dat$SEX == "S"] <- "B"
table(dat$SEX) ## Birth weights and age of dams Here we visualize the relationship between age of dams and birth weights. plot(dat$AgeDam.mon., dat$BWT, xlab = "BWT", ylab = "Age of dams") abline(lm(dat$BWT ~ dat$AgeDam.mon.), lwd = 3, col = "red") lines(lowess(dat$AgeDam.mon., dat$BWT), lwd = 4, col = "blue") ### Exercise 3 What is the correlation between age of dams and birth weights? ## Birth weights and sex of calves The boxplot() function generates distributions of birth weights for bulls and cows. boxplot(dat$BWT[dat$SEX == "B"], dat$BWT[dat\$SEX == "C"], names = c("Bulls",
"Cows"), ylab = "Birth weights")

### Exercise 4

What is the mean difference between birth weights of bulls and cows?

# Ordinary least squares

We will use the lm() function to fit OLS. This function offers an intuitive formula syntax. The summary() function summarizes a model fit.

fit <- lm(BWT ~ AgeDam.mon. + factor(SEX), data = dat)
summary(fit)

# Save R objects

Use the save() function to save variables as R objects.

save(dat, file = "dat_day17.Rda")
save(fit, file = "fit_day17.Rda")

### Exercise 5

Comfirm the estimates of p-values from t-statistics by using the pt function. The pt() function gives the probability density function.

### Exercise 6

Explain why the degrees of freedom is 997.

### Exercise 7

Comfirm the residual standard error by using the predict() function. This function returns $$\hat{\mathbf{y}}$$. Note that the residual standard error is given by \begin{align*} \hat{\sigma}^2_e &= \frac{\sum_{i=1}^{n} (y_i - \sum_{j=1}^{m} x_{ij}\hat{b}_{j})^2 }{n - k - 1} \\ &= \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }{n - k - 1}. \\ \end{align*}

### Exercise 8

Compute the multiple R-squared ($$R^2$$) by yourself. The multiple R-squared indicates the proportion of explained variance. It is given by the squared correlation between the observed and predicted values. $R^2 = cor(\mathbf{y}, \hat{\mathbf{y}})^2$

### Exercise 9

Compute the adjusted multiple R-squared by yourself. The adjusted multiple R-squared is given by $R^2_{adj} = 1 - (1 - R^2) \left(\frac{n-1}{n - k - 1} \right).$

March 7, 2017