ASCI 896 Statistical Genomics

Due date

Thursday, March 16, 5pm

Mice data

Load mouse SNP data available in the BGLR R package.

library(BGLR)
data(mice)
`?`(mice)
`?`(mice.X)
`?`(mice.pheno)

Question 1

Write a function that returns the total allelic (TA) relationship of Nejati-Javaremi et al. (1997) (doi). Apply your function to the first three individuals (i.e., ID1-ID2, ID1-ID3, and ID2-ID3) and the first five SNPs (i.e., SNP1, SNP2, SNP3, SNP4, and SNP5) in mice.X.

Question 2

Compute the first genomic relationship matrix (\(\mathbf{G}_1\)) of VanRaden (2008) (doi) using the entire markers and all individuals. Report the median of the lower triangular part of \(\mathbf{G}_1\) matrix.

Question 3

Compute the second genomic relationship matrix (\(\mathbf{G}_2\)) of VanRaden (2008) using the entire markers and all individuals. Report the median of the lower triangular part of \(\mathbf{G}_2\) matrix. What is the correlation between lower triangular parts of \(\mathbf{G}_1\) and \(\mathbf{G}_2\) matrices?

Question 4

Compute the dominant genomic relationship matrix (\(\mathbf{D}_1\)) of Su et al. (2012) (doi) using the entire markers and all individuals. Report the median of the lower triangular part of \(\mathbf{D}_1\) matrix.

Question 5

Compute the dominant genomic relationship matrix (\(\mathbf{D}_2\)) of Vitezica et al. (2013) (doi) using the entire markers and all individuals. Report the median of the lower triangular part of \(\mathbf{D}_2\) matrix. What is the correlation between lower triangular parts of \(\mathbf{D}_1\) and \(\mathbf{D}_2\) matrices?

Question 6

Perform a single marker OLS-based GWA analysis by fitting \(y = \mathbf{X}\beta + \mathbf{W}_{ac} a + \mathbf{W}_d d + e\), where \(y\) is the BMI phenotype, \(\mathbf{X}\) includes systematic effects of intercept, sex, litter size, and cage density, \(\mathbf{W}_{ac}\) is the centered additive marker genotype matrix, \(\mathbf{W}_{d}\) is the dominant marker genotype matrix defined in Vitezica et al. (2013), \(a\) is the additive marker effect, and \(d\) is the dominant marker effect. Report 1) SNP ID that has the smallest p-value for the additive marker effect and 2) SNP ID that has the smallest p-value for the dominant marker effect. If additive and dominant marker effects are not simultaneously estimable, set \(p\)-values equal to NA. Ignore multiple testing corrections for simplicity. Use the lm() function.

Gota Morota

March 2, 2017