## ASCI 896 Statistical Genomics

### Due date

Thursday, March 16, 5pm

## Mice data

Load mouse SNP data available in the BGLR R package.

library(BGLR)
data(mice)
?(mice)
?(mice.X)
?(mice.pheno)

## Question 1

Write a function that returns the total allelic (TA) relationship of Nejati-Javaremi et al. (1997) (doi). Apply your function to the first three individuals (i.e., ID1-ID2, ID1-ID3, and ID2-ID3) and the first five SNPs (i.e., SNP1, SNP2, SNP3, SNP4, and SNP5) in mice.X.

## Question 2

Compute the first genomic relationship matrix ($$\mathbf{G}_1$$) of VanRaden (2008) (doi) using the entire markers and all individuals. Report the median of the lower triangular part of $$\mathbf{G}_1$$ matrix.

## Question 3

Compute the second genomic relationship matrix ($$\mathbf{G}_2$$) of VanRaden (2008) using the entire markers and all individuals. Report the median of the lower triangular part of $$\mathbf{G}_2$$ matrix. What is the correlation between lower triangular parts of $$\mathbf{G}_1$$ and $$\mathbf{G}_2$$ matrices?

## Question 4

Compute the dominant genomic relationship matrix ($$\mathbf{D}_1$$) of Su et al. (2012) (doi) using the entire markers and all individuals. Report the median of the lower triangular part of $$\mathbf{D}_1$$ matrix.

## Question 5

Compute the dominant genomic relationship matrix ($$\mathbf{D}_2$$) of Vitezica et al. (2013) (doi) using the entire markers and all individuals. Report the median of the lower triangular part of $$\mathbf{D}_2$$ matrix. What is the correlation between lower triangular parts of $$\mathbf{D}_1$$ and $$\mathbf{D}_2$$ matrices?

## Question 6

Perform a single marker OLS-based GWA analysis by fitting $$y = \mathbf{X}\beta + \mathbf{W}_{ac} a + \mathbf{W}_d d + e$$, where $$y$$ is the BMI phenotype, $$\mathbf{X}$$ includes systematic effects of intercept, sex, litter size, and cage density, $$\mathbf{W}_{ac}$$ is the centered additive marker genotype matrix, $$\mathbf{W}_{d}$$ is the dominant marker genotype matrix defined in Vitezica et al. (2013), $$a$$ is the additive marker effect, and $$d$$ is the dominant marker effect. Report 1) SNP ID that has the smallest p-value for the additive marker effect and 2) SNP ID that has the smallest p-value for the dominant marker effect. If additive and dominant marker effects are not simultaneously estimable, set $$p$$-values equal to NA. Ignore multiple testing corrections for simplicity. Use the lm() function.

March 2, 2017