Review of basic statistics

Introduction

We will learn how to compute basic statistics in R by using the subset of beef cattle data set.

Read a file

Read the phenotypic data

We will first set a path to the phenotypic dataset. Insert your path to the file inside the double quotation " ". Your path may differ from the path below if you stored the file in a different directory.

FILE <- "../data/sub/1000withEffects.redangus"

The function read.table reads a file in a data frame format. Typing help will open a documentation page.

help(read.table)

The read.table function has many arguments but we will use only three in this exercise.

dat <- read.table(file = FILE, header = TRUE, stringsAsFactors = FALSE)

The head function returns the first six rows of a data frame. The number of rows returned can be varied by controlling the argument n.

head(dat)
head(dat, n = 10)

The dim function returns the dimension of data frame.

dim(dat)

We will use two phenotypes to compute basic statistics. Each column of data frame can be accessed by the $ operator followed by a column name.

dat$BWT
dat$CE

The length function returns the length of vector.

length(dat$BWT)
length(dat$CE)

Mean

The mean function computes the mean of a vector.

mean(dat$BWT)

Exercise 1

Verfiy that \(E(aX) = aE(X)\). Set \(a = 10\). Here \(X\) is a vector of body weight. Use the multiplication operator *.

Variance

The var function computes the sample variance of a vector.

var(dat$BWT)

Alternatively, we can use the equation \(Var(X) = \frac{1}{N-1}\sum(X - \bar{X})^2\)

sum((dat$BWT - mean(dat$BWT))^2)/(length(dat$BWT) - 1)

Exercise 2

Verfiy that \(Var(aX) = a^2Var(X)\). Set \(a = 10\).

Covariance

The cov function computes the covariance of two vectors.

cov(dat$BWT, dat$CE)

Alternatively, we can use the equation \(Cov(X, Y) = \frac{1}{N-1}\sum(X - \bar{X})(Y - \bar{Y})\)

sum((dat$BWT - mean(dat$BWT)) * (dat$CE - mean(dat$CE)))/(length(dat$BWT) - 
    1)

Exercise 3

Verfiy that \(Cov(aX, Y) = aCov(X, Y)\) by setting \(a = 10\). Here \(X\) is a vector of body weight and \(Y\) is a vector of calving ease.

Exercise 4

Verfiy that \(Cov(aX, bY) = abCov(X, Y)\) by setting \(a = 10\) and \(b=5\).

Correlation

The cor function computes the correlation of two vectors.

cor(dat$BWT, dat$CE)

Regression

The regression of Y on X is given by \(b_{Y,X} = \frac{Cov(X,Y)}{Var(X)}\).

cov(dat$BWT, dat$CE)/var(dat$BWT)

Exercise 5

Verfiy that regression of \(Y\) on \(aX\) is \(b_{Y, aX} = \frac{Cov(aX,Y)}{Var(aX)} = \frac{Cov(X,Y)}{aVar(X)}\). Set \(a = 10\).

R and RStudio Cheat Sheets

Additional reading

Gota Morota

January 12, 2017