Data

We are going to use the cattle data included in the synbreedData package. Learn more about Synbreed project and synbreed R packages.

if(!require(synbreed) | !require(synbreedData)){
    install.packages("synbreed", "synbreedData")
}
library(synbreed)
library(synbreedData)
data(cattle)
y <- as.matrix(cattle$pheno[,1,1])
dim(cattle$geno)
X <- codeGeno(cattle)$geno # 500 x 7250

Genotype imputation

Replace missing marker genotypes with mean values. Then store the marker genotypes in a matrix object X.

for (j in 1:ncol(X)){
  X[,j] <- ifelse(is.na(X[,j]), mean(X[,j], na.rm=TRUE), X[,j])
}

Quality control

Perform a quality control analysis by removing markers with MAF < 0.05. How many markers are removed? Save the filtered genotype matrix in X.

# MAF < 0.05
p <- colMeans(X) / 2
maf <- ifelse(p > 0.5, 1-p, p)
maf.index <- which(maf < 0.05)
X <- X[, -maf.index] # 500 x 6714

Standardization of genotype matrix

Standardize the genotype matrix to have a mean of zero and variance of one. Save this matrix as Xs.

Xs <- scale(X, center = TRUE, scale = TRUE)

Question 1

Define two functions that compute a G matrix and a Gaussian kernel matrix.

Question 2

Create a G matrix and also three Gaussin kernel matrices where means of upper triangler matrix are about 0.25, 0.50, and 0.85, respectively. Plot histograms to compare off-diagonal elements of these matrices.

Question 3

Jiang and Reif (2015) ( DOI: 10.1534/genetics.115.177907) showed that a Gaussian kerenel matrix can be decomposed into \(GK = \Lambda \tilde{H} \Lambda\), where \(\Lambda = diag[\exp (- \theta \sum^m_{k=1} x^2_{1k} ) , \cdots, \exp (- \theta \sum^m_{k=1}x^2_{nk})]\), \(\tilde{H} = 1_{n \times n} + \sum^{\infty}_{k=1} \frac{(2 \theta m)^k}{k!} G^{\#k}\), and \(G\) is the second genomic relationship matrix of VanRaden (2008). Confirm that this is indeed true. Also, up to what order of additive x additive epistatic relationship matrix is required to reproduce the original Gaussian kernel matrix? Use the bandwidth parameter \(\theta = 0.0001\).