# Ordinary least squares and BLUE

# Overview

We will learn how to connect phenotypes and systematic effects (fixed effects) using ordinary least squares (OLS).

# Read file

Use the function `read.table()`

to read the phenotype file (1000withEffects.redangus).

```
dat <- read.table(file = file.choose(), header = TRUE, stringsAsFactors = FALSE)
dim(dat)
```

# Phenotypes

The `hist()`

function plot a histogram.

`hist(dat$BWT, col = "lightblue", main = "Histogram", xlab = "Birth weights")`

The argument `prob=TRUE`

returns a probability on the y axis. We use the `lines()`

function to add a density estimate along with the histogram.

```
hist(dat$BWT, col = "lightblue", main = "Histogram", prob = TRUE, ylim = c(0,
0.04), xlab = "Birth weights")
lines(density(dat$BWT), col = "red")
```

### Exercise 1

What are the mean, median, standard deviation, maximum, and minimum of birth weights?

# Systematic effects

We will estimate the effects of age of dams and sex of calves on birth weights. Letâ€™s first check the distributions of the systematic effects.

## Age of dams

The `hist()`

functions plot a histogram.

`hist(dat$AgeDam.mon., col = "lightblue", main = "Histogram", xlab = "Age of dams")`

Below is a histogram with the argumment `prob=TRUE`

.

```
hist(dat$AgeDam.mon., col = "lightblue", main = "Histogram", prob = TRUE, xlab = "Age of dams")
lines(density(dat$AgeDam.mon), col = "red")
```

### Exercise 2

What are the mean, median, standard deviation, maximum, and minimum of age of dams?

## Sex of calves

The `barplot()`

function generates a barplot.

```
barplot(table(dat$SEX), main = "Distribution", ylab = "Frequency", names.arg = c("Bulls",
"Cows", "Steers"), col = "lightblue")
```

Because there is only one steer, we treat him as a bull.

```
table(dat$SEX)
dat$SEX[dat$SEX == "S"] <- "B"
table(dat$SEX)
```

## Birth weights and age of dams

Here we visualize the relationship between age of dams and birth weights.

```
plot(dat$AgeDam.mon., dat$BWT, xlab = "BWT", ylab = "Age of dams")
abline(lm(dat$BWT ~ dat$AgeDam.mon.), lwd = 3, col = "red")
lines(lowess(dat$AgeDam.mon., dat$BWT), lwd = 4, col = "blue")
```

### Exercise 3

What is the correlation between age of dams and birth weights?

## Birth weights and sex of calves

The `boxplot()`

function generates distributions of birth weights for bulls and cows.

```
boxplot(dat$BWT[dat$SEX == "B"], dat$BWT[dat$SEX == "C"], names = c("Bulls",
"Cows"), ylab = "Birth weights")
```

### Exercise 4

What is the mean difference between birth weights of bulls and cows?

# Ordinary least squares

We will use the `lm()`

function to fit OLS. This function offers an intuitive formula syntax. The `summary()`

function summarizes a model fit.

```
fit <- lm(BWT ~ AgeDam.mon. + factor(SEX), data = dat)
summary(fit)
```

# Save R objects

Use the `save()`

function to save variables as R objects.

```
save(dat, file = "dat_day17.Rda")
save(fit, file = "fit_day17.Rda")
```

### Exercise 5

Comfirm the estimates of p-values from t-statistics by using the `pt`

function. The `pt()`

function gives the probability density function.

### Exercise 6

Explain why the degrees of freedom is 997.

### Exercise 7

Comfirm the residual standard error by using the `predict()`

function. This function returns \(\hat{\mathbf{y}}\). Note that the residual standard error is given by \[
\begin{align*}
\hat{\sigma}^2_e &= \frac{\sum_{i=1}^{n} (y_i - \sum_{j=1}^{m} x_{ij}\hat{b}_{j})^2 }{n - k - 1} \\
&= \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }{n - k - 1}. \\
\end{align*}
\]

### Exercise 8

Compute the multiple R-squared (\(R^2\)) by yourself. The multiple R-squared indicates the proportion of explained variance. It is given by the squared correlation between the observed and predicted values. \[ R^2 = cor(\mathbf{y}, \hat{\mathbf{y}})^2 \]

### Exercise 9

Compute the adjusted multiple R-squared by yourself. The adjusted multiple R-squared is given by \[ R^2_{adj} = 1 - (1 - R^2) \left(\frac{n-1}{n - k - 1} \right). \]