Review of basic statistics
Introduction
We will learn how to compute basic statistics in R by using the subset of beef cattle data set.
Read a file
Read the phenotypic data
We will first set a path to the phenotypic dataset. Insert your path to the file inside the double quotation " "
. Your path may differ from the path below if you stored the file in a different directory.
FILE <- "../data/sub/1000withEffects.redangus"
The function read.table
reads a file in a data frame format. Typing help
will open a documentation page.
help(read.table)
The read.table
function has many arguments but we will use only three in this exercise.
dat <- read.table(file = FILE, header = TRUE, stringsAsFactors = FALSE)
The head
function returns the first six rows of a data frame. The number of rows returned can be varied by controlling the argument n
.
head(dat)
head(dat, n = 10)
The dim
function returns the dimension of data frame.
dim(dat)
We will use two phenotypes to compute basic statistics. Each column of data frame can be accessed by the $
operator followed by a column name.
dat$BWT
dat$CE
The length
function returns the length of vector.
length(dat$BWT)
length(dat$CE)
Mean
The mean
function computes the mean of a vector.
mean(dat$BWT)
Exercise 1
Verfiy that \(E(aX) = aE(X)\). Set \(a = 10\). Here \(X\) is a vector of body weight. Use the multiplication operator *
.
Variance
The var
function computes the sample variance of a vector.
var(dat$BWT)
Alternatively, we can use the equation \(Var(X) = \frac{1}{N-1}\sum(X - \bar{X})^2\)
sum((dat$BWT - mean(dat$BWT))^2)/(length(dat$BWT) - 1)
Exercise 2
Verfiy that \(Var(aX) = a^2Var(X)\). Set \(a = 10\).
Covariance
The cov
function computes the covariance of two vectors.
cov(dat$BWT, dat$CE)
Alternatively, we can use the equation \(Cov(X, Y) = \frac{1}{N-1}\sum(X - \bar{X})(Y - \bar{Y})\)
sum((dat$BWT - mean(dat$BWT)) * (dat$CE - mean(dat$CE)))/(length(dat$BWT) -
1)
Exercise 3
Verfiy that \(Cov(aX, Y) = aCov(X, Y)\) by setting \(a = 10\). Here \(X\) is a vector of body weight and \(Y\) is a vector of calving ease.
Exercise 4
Verfiy that \(Cov(aX, bY) = abCov(X, Y)\) by setting \(a = 10\) and \(b=5\).
Correlation
The cor
function computes the correlation of two vectors.
cor(dat$BWT, dat$CE)
Regression
The regression of Y on X is given by \(b_{Y,X} = \frac{Cov(X,Y)}{Var(X)}\).
cov(dat$BWT, dat$CE)/var(dat$BWT)
Exercise 5
Verfiy that regression of \(Y\) on \(aX\) is \(b_{Y, aX} = \frac{Cov(aX,Y)}{Var(aX)} = \frac{Cov(X,Y)}{aVar(X)}\). Set \(a = 10\).
Online R Tutorials
Additional reading
- Fienberg (2014) What is Statistics?. Annual Review of Statistics and Its Application.