Processing math: 0%
+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Quantitative Genetics

Decoding mixed model equations
The 64th RBras and 18th SEAGRO Meeting

Gota Morota
http://morotalab.org/

2019/8/1

1 / 95

About me

2 / 95

Tapioca!

3 / 95

Tapioca!

4 / 95

Quantitative Genetics

5 / 95

Quantitative Genetics

Analysis of complex or multifactorial traits

6 / 95

Quantitative Genetics

Analysis of complex or multifactorial traits

All genes affect all traits - the question is by how much?

7 / 95

Quantitative Genetics

Analysis of complex or multifactorial traits

All genes affect all traits - the question is by how much?

Infinitesimal model

8 / 95

Quantitative Genetics

Analysis of complex or multifactorial traits

All genes affect all traits - the question is by how much?

Infinitesimal model

Quasi-infinitesimal model, Oligogenic model

9 / 95

What is quantitative genetics?

10 / 95

What is quantitative genetics?

Population genetics

  • Mathematics is language of population genetics, population genetics is language of evolution.
11 / 95

What is quantitative genetics?

Population genetics

  • Mathematics is language of population genetics, population genetics is language of evolution.

Quantitative genetics

  • Statistics is language of quantitative genetics, quantitative genetics is language of complex trait genetics.
12 / 95

What is quantitative genetics?

Population genetics

  • Mathematics is language of population genetics, population genetics is language of evolution.

Quantitative genetics

  • Statistics is language of quantitative genetics, quantitative genetics is language of complex trait genetics.

Phenotypes first in quantitative genetics

In the era of genomics, phenotype is king

via GIPHY

13 / 95

Regression model

Galton (1886). Regression towards mediocrity in hereditary stature "

14 / 95

Complex traits

15 / 95

Genetic values

Quantitative genetic model:

\begin{align*} \mathbf{y} &= \mathbf{g} + \boldsymbol{\epsilon} \\ \end{align*} where \mathbf{y} is the vector of observed phenotypes, \mathbf{g} is the vector of genetic values, and \boldsymbol{\epsilon} is the vector of residuals.

Example:

Animal ID y g e
1 10 ? ?
2 7 ? ?
3 12 ? ?
16 / 95

Genetic values

Quantitative genetic model:

\begin{align*} \mathbf{y} &= \mathbf{g} + \boldsymbol{\epsilon} \\ \end{align*} where \mathbf{y} is the vector of observed phenotypes, \mathbf{g} is the vector of genetic values, and \boldsymbol{\epsilon} is the vector of residuals.

Example:

Animal ID y g e
1 10 5 5
2 7 6 1
3 12 2 10
17 / 95

Genetic values

Quantitative genetic model:

\begin{align*} \mathbf{y} &= \mathbf{g} + \boldsymbol{\epsilon} \\ \end{align*} where \mathbf{y} is the vector of observed phenotypes, \mathbf{g} is the vector of genetic values, and \boldsymbol{\epsilon} is the vector of residuals.

Example:

Animal ID y g e
1 10 5 5
2 7 6 1
3 12 2 10

Phenotypes can be observed and measured but genotypic and additive genetic effects cannot

18 / 95

Source of genetic signals

  1. linkage disequilibrium between QTL and SNP

  2. relationship among individuals

via GIPHY

19 / 95

Conventional Phenotyping

  • labor intensive

  • prone to measurement error

20 / 95

Genomic information (e.g., SNPs)

Repeat of numbers 0, 1, and 2

21 / 95

Quantitative genetics

Connecting phenotypic data with genomic information

= + error
\begin{align*} \mathbf{y} &= \mathbf{g} + \boldsymbol{\epsilon} \\ &\approx \mathbf{W}\mathbf{a} + \boldsymbol{\epsilon} \end{align*} We approximate unknown \mathbf{g} with \mathbf{Wa}. \begin{align*} \underbrace{\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_n\end{bmatrix}}_{n \times 1} &= \underbrace{\begin{bmatrix} w_{11} & w_{12} & \cdots & w_{1m} \\ w_{21} & w_{22} & \cdots & w_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ w_{n1} & w_{n2} & \cdots & w_{nm} \end{bmatrix}}_{n \times m} \quad \underbrace{\begin{bmatrix} a_1\\ a_2\\ \vdots \\ a_m\end{bmatrix}}_{m \times 1} +\underbrace{\begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots \\ \epsilon_m\end{bmatrix}}_{n \times 1} \end{align*}

where n is the number of individuals (e.g., accessions) and m is the number of SNPs.

22 / 95

Precision agriculture using advanced technologies

23 / 95

Precision (digital) phenotyping

  • automated process

  • less labor intensive

  • less prone to measurement error

24 / 95

Image phenotypes

   

Image data

25 / 95

Quantitative genetics

Connecting image data with genomic information

= + error
\begin{align*} \mathbf{y} &= \mathbf{g} + \boldsymbol{\epsilon} \\ &\approx \mathbf{W}\mathbf{a} + \boldsymbol{\epsilon} \end{align*} We approximate unknown \mathbf{g} with \mathbf{Wa}. \begin{align*} \underbrace{\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_n\end{bmatrix}}_{n \times 1} &= \underbrace{\begin{bmatrix} w_{11} & w_{12} & \cdots & w_{1m} \\ w_{21} & w_{22} & \cdots & w_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ w_{n1} & w_{n2} & \cdots & w_{nm} \end{bmatrix}}_{n \times m} \quad \underbrace{\begin{bmatrix} a_1\\ a_2\\ \vdots \\ a_m\end{bmatrix}}_{m \times 1} +\underbrace{\begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots \\ \epsilon_m\end{bmatrix}}_{n \times 1} \end{align*}

where n is the number of individuals (e.g., accessions) and m is the number of SNPs.

26 / 95

Google Trends: 2012-2017

27 / 95

Big and messy data

Big data are being generated in almost every field

  • too large to permit visual inspection

  • big data \ne clean data

  • missing observations

  • empty cells

  • confounding

  • outliers

28 / 95

Big in n or m?

  • n: number of individuals (records)

  • m number of SNPs (genetic markers)

29 / 95

Why we use big data in genetics? (Makowsky et al., 2011)

"

30 / 95

Why we use big data in genetics? (Erbe et al., 2013)

"

31 / 95

How to parameterize response variable y?

  • Prediction of additive genetic effects

    • \mathbf{ y = S + a + \boldsymbol{\epsilon}}
  • Prediction of total genetic effects parametrically

    • \mathbf{ y = \mathbf{S} + \underbrace{\mathbf{ a + d + a*a + a*d + d*d}}_{g} + } \boldsymbol{\epsilon}
  • Prediction of total genetic effects non-parametrically

    • \mathbf{ y = \mathbf{S} + \mathbf{g} + \boldsymbol{\epsilon}}
32 / 95

Prediction vs. Inference

Complex traits are controlled by large number of genes with small effects, and influenced by both genetics and environments

  • Inference (location)

    • average effects of allele substitution
  • Inference (variability)

    • variance component estimation
    • genomic heritability

Combination of above two (e.g., estimate proportion of additive genetic variance explained by QTLs)

  • Prediction
    • genomic selection
    • prediction of yet-to-be observed phenotypes
33 / 95

Prediction vs. Inference

34 / 95

GWAS vs. Prediction

Wikimedia Commons

35 / 95

Genomic prediction - use all available markers

Meuwissen et al. (2001)

  • Genomic selection
  • Genome-enabled selection
  • Genome-assisted selection
  • Genomic prediction
  • Genome-enabled prediction
  • Genome-assisted prediction
  • generation interval
  • prediction performance
36 / 95

Single trait linear mixed model

\mathbf{y = Xb + Zu + e}

  • \mathbf{X}: incidence matrix of systematic effects

  • \mathbf{Z}: incidence matrix of random effects

  • \mathbf{K}: genomic relationship matrix

  • \mathbf{R}: residual relationship matrix

  • \sigma^2_{u}: genomic variance; \sigma^2_{e}: residual variance

  • BLUE: \hat{\mathbf{b}} = (\mathbf{X'V^{-1}X})^{-}\mathbf{X'V^{-1}y}; BLUP: \hat{\mathbf{u}} = \mathbf{KZ'}\mathbf{V^{-1}}(\mathbf{y - X\hat{b}})

  • where \mathbf{V} = \mathbf{ZK\sigma^2_{u}Z' + R\sigma^2_{e}}

37 / 95

Mixed model equations (Henderson)

The corresponding mixed model equations (MME) are

  • \mathbf{G}^* = \sigma^2_u \mathbf{K}

  • \mathbf{R}^* = \sigma^2_e \mathbf{R}

If we multipy \mathbf{R}^* = \sigma^2_e\mathbf{I} to the both sides

where \lambda = \sigma^2_e / \sigma^2_u

  • MME produces BLUE (E-BLUE) and BLUP (E-BLUP) simultaneously
38 / 95

Decoding single trait MME

N Phe Env Gen
1 47 E1 G1
2 51 E1 G2
3 46 E1 G3
4 58 E1 G4
5 52 E2 G1
6 46 E2 G2
7 52 E2 G3
8 54 E2 G4
9 53 E3 G1
10 48 E3 G2
11 58 E3 G3
12 52 E3 G4
39 / 95

What is X?

N EnvE1 EnvE2 EnvE3
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 0 1 0
6 0 1 0
7 0 1 0
8 0 1 0
9 0 0 1
10 0 0 1
11 0 0 1
12 0 0 1
  • \mathbf{X} is the 12 x 3 matrix
40 / 95

What is Z?

N GenG1 GenG2 GenG3 GenG4
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 1 0 0 0
6 0 1 0 0
7 0 0 1 0
8 0 0 0 1
9 1 0 0 0
10 0 1 0 0
11 0 0 1 0
12 0 0 0 1
  • \mathbf{Z} is the 12 x 4 matrix
41 / 95

What is X'X?

  • \mathbf{X'} is the 3 x 12 matrix

  • \mathbf{X} is the 12 x 3 matrix

  • \mathbf{X'X} is the 3 x 3 matrix

EnvE1 EnvE2 EnvE3
EnvE1 4 0 0
EnvE2 0 4 0
EnvE3 0 0 4
42 / 95

What is X'X?

  • \mathbf{X'} is the 3 x 12 matrix

  • \mathbf{X} is the 12 x 3 matrix

  • \mathbf{X'X} is the 3 x 3 matrix

EnvE1 EnvE2 EnvE3
EnvE1 4 0 0
EnvE2 0 4 0
EnvE3 0 0 4
  • \mathbf{X'X} is the 3 x 3 diagonal matrix counting the number of phenotypes observed in each environment
43 / 95

What is X'Z?

  • \mathbf{X'} is the 3 x 12 matrix

  • \mathbf{Z} is the 12 x 4 matrix

  • \mathbf{X'Z} is the 3 x 4 matrix

GenG1 GenG2 GenG3 GenG4
EnvE1 1 1 1 1
EnvE2 1 1 1 1
EnvE3 1 1 1 1
44 / 95

What is X'Z?

  • \mathbf{X'} is the 3 x 12 matrix

  • \mathbf{Z} is the 12 x 4 matrix

  • \mathbf{X'Z} is the 3 x 4 matrix

GenG1 GenG2 GenG3 GenG4
EnvE1 1 1 1 1
EnvE2 1 1 1 1
EnvE3 1 1 1 1
  • \mathbf{X'Z} is the 3 x 4 matrix counting the number of each genotype in each environment
45 / 95

What is Z'X?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{X} is the 12 x 3 matrix

  • \mathbf{X'Z} is the 4 x 3 matrix

EnvE1 EnvE2 EnvE3
GenG1 1 1 1
GenG2 1 1 1
GenG3 1 1 1
GenG4 1 1 1
  • \mathbf{Z'X} is the 4 x 3 matrix counting the number of each genotype in each environment
46 / 95

What is Z'Z?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{Z} is the 12 x 4 matrix

  • \mathbf{Z'Z} is the 4 x 4 matrix

GenG1 GenG2 GenG3 GenG4
GenG1 3 0 0 0
GenG2 0 3 0 0
GenG3 0 0 3 0
GenG4 0 0 0 3
  • \mathbf{Z'Z} is the 4 x 4 diagonal matrix counting the number of phenotypes observed for each genotype
47 / 95

What is Z'Z + \lambda \mathbf{K}^{-1}?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{Z} is the 12 x 4 matrix

  • \mathbf{Z'Z} is the 4 x 4 matrix

48 / 95

What is Z'Z + \lambda \mathbf{K}^{-1}?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{Z} is the 12 x 4 matrix

  • \mathbf{Z'Z} is the 4 x 4 matrix

  • assume \mathbf{K} = \mathbf{I} (no relationship)

  • \lambda = \sigma^2_e / \sigma^2_u = 1.64/9.56 = 0.17

49 / 95

What is Z'Z + \lambda \mathbf{K}^{-1}?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{Z} is the 12 x 4 matrix

  • \mathbf{Z'Z} is the 4 x 4 matrix

  • assume \mathbf{K} = \mathbf{I} (no relationship)

  • \lambda = \sigma^2_e / \sigma^2_u = 1.64/9.56 = 0.17

GenG1 GenG2 GenG3 GenG4
GenG1 3.17 0 0 0
GenG2 0 3.17 0 0
GenG3 0 0 3.17 0
GenG4 0 0 0 3.17
  • \mathbf{Z'Z} + \lambda \mathbf{I} is the 4 x 4 diagonal matrix counting the number of phenotypes observed for each genotype + \lambda value in the diagonal elements
50 / 95

What is the left hand side of MME?

EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
EnvE1 4 0 0 1 1 1 1
EnvE2 0 4 0 1 1 1 1
EnvE3 0 0 4 1 1 1 1
GenG1 1 1 1 3.17 0 0 0
GenG2 1 1 1 0 3.17 0 0
GenG3 1 1 1 0 0 3.17 0
GenG4 1 1 1 0 0 0 3.17
51 / 95

What is X'y?

  • \mathbf{X'} is the 3 x 12 matrix

  • \mathbf{y} is the 12 x 1 matrix

  • \mathbf{X'y} is the 3 x 1 matrix

EnvE1 202
EnvE2 204
EnvE3 211
52 / 95

What is X'y?

  • \mathbf{X'} is the 3 x 12 matrix

  • \mathbf{y} is the 12 x 1 matrix

  • \mathbf{X'y} is the 3 x 1 matrix

EnvE1 202
EnvE2 204
EnvE3 211
  • \mathbf{X'y} is the 3 x1 matrix counting the sum of phenotypes in each environment
53 / 95

What is Z'y?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{y} is the 12 x 1 matrix

  • \mathbf{Z'y} is the 4 x 1 matrix

GenG1 152
GenG2 145
GenG3 156
GenG4 164
54 / 95

What is Z'y?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{y} is the 12 x 1 matrix

  • \mathbf{Z'y} is the 4 x 1 matrix

GenG1 152
GenG2 145
GenG3 156
GenG4 164
  • \mathbf{Z'y} is the 4 x1 matrix counting the sum of phenotypes for each genotype
55 / 95

What is the right hand side of MME?

EnvE1 202
EnvE2 204
EnvE3 211
GenG1 152
GenG2 145
GenG3 156
GenG4 164
56 / 95

Solutions

EnvE1 50.50
EnvE2 51.00
EnvE3 52.75
GenG1 -0.71
GenG2 -2.92
GenG3 0.55
GenG4 3.08
  • These are BLUE and BLUP of environments and genotypes, respectively
57 / 95

The role of lambda

BLUE = sum / n_{x} = the sum of phenotypes in each environment / the number of phenotypes observed in each environment

  • BLUE is simply computing averages

BLUP = sum / n_{z} + \lambda = the sum of phenotypes for each genotype / the number of phenotypes observed for each genotype + \lambda

  • BLUP is shrinked toward zero (proportional to \lambda)
58 / 95

The role of lambda

BLUE = sum / n_{x} = the sum of phenotypes in each environment / the number of phenotypes observed in each environment

  • BLUE is simply computing averages

BLUP = sum / n_{z} + \lambda = the sum of phenotypes for each genotype / the number of phenotypes observed for each genotype + \lambda

  • BLUP is shrinked toward zero (proportional to \lambda)

Note that \lambda = \frac{1-h^2}{h^2}

  • More observations \rightarrow less shrinkage

  • Higher heritability \rightarrow less shrinkage

59 / 95

Genomic relationship matrix

1: The first type of \mathbf{G} matrix

\mathbf{G} = \frac{\mathbf{W_c}\mathbf{W_c'}}{\sum 2 p_j (1-p_j)}

  • \mathbf{W_c}: centered marker matrix

  • p_j: allele frequency at jth marker

2: The second type of \mathbf{G} matrix

\mathbf{G} = \frac{\mathbf{W_{cs}}\mathbf{W_{cs}'}}{m}

  • \mathbf{W_{cs}}: centered and scaled marker matrix

  • m: number of markers

60 / 95

Genomic relationship matrix

61 / 95

When relationships are known

Suppose \mathbf{K} is given by

GenG1 GenG2 GenG3 GenG4
GenG1 1.00 0.64 0.23 0.48
GenG2 0.64 1.00 0.33 0.67
GenG3 0.23 0.33 1.00 0.31
GenG4 0.48 0.67 0.31 1.00
62 / 95

When relationships are known

Suppose \mathbf{K} is given by

GenG1 GenG2 GenG3 GenG4
GenG1 1.00 0.64 0.23 0.48
GenG2 0.64 1.00 0.33 0.67
GenG3 0.23 0.33 1.00 0.31
GenG4 0.48 0.67 0.31 1.00

Then \lambda \mathbf{K}^{-1} is

GenG1 GenG2 GenG3 GenG4
GenG1 0.15 -0.09 0.00 -0.01
GenG2 -0.09 0.22 -0.02 -0.10
GenG3 0.00 -0.02 0.10 -0.02
GenG4 -0.01 -0.10 -0.02 0.17
  • \lambda = \sigma^2_e / \sigma^2_u = 1.64/17.70 = 0.09
63 / 95

What is the left hand side of MME?

EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
EnvE1 4 0 0 1 1 1 1
EnvE2 0 4 0 1 1 1 1
EnvE3 0 0 4 1 1 1 1
GenG1 1 1 1 3.15 -0.09 0.00 -0.01
GenG2 1 1 1 -0.09 3.22 -0.02 -0.10
GenG3 1 1 1 0.00 -0.02 3.10 -0.02
GenG4 1 1 1 -0.01 -0.10 -0.02 3.17
64 / 95

When there are missing phenotypes

N Phe Env Gen
1 47 E1 G1
2 51 E1 G2
3 NA E1 G3
4 58 E1 G4
5 52 E2 G1
6 46 E2 G2
7 52 E2 G3
8 NA E2 G4
9 53 E3 G1
10 48 E3 G2
11 58 E3 G3
12 52 E3 G4
65 / 95

What is X?

N EnvE1 EnvE2 EnvE3
1 1 0 0
2 1 0 0
4 1 0 0
5 0 1 0
6 0 1 0
7 0 1 0
9 0 0 1
10 0 0 1
11 0 0 1
12 0 0 1
  • Remove missing rows

  • \mathbf{X} is the 10 x 3 matrix

66 / 95

What is Z?

N GenG1 GenG2 GenG3 GenG4
1 1 0 0 0
2 0 1 0 0
4 0 0 0 1
5 1 0 0 0
6 0 1 0 0
7 0 0 1 0
9 1 0 0 0
10 0 1 0 0
11 0 0 1 0
12 0 0 0 1
  • Remove missing rows

  • \mathbf{Z} is the 10 x 4 matrix

67 / 95

What is X'X?

  • \mathbf{X'} is the 3 x 10 matrix

  • \mathbf{X} is the 10 x 3 matrix

  • \mathbf{X'X} is the 3 x 3 matrix

EnvE1 EnvE2 EnvE3
EnvE1 3 0 0
EnvE2 0 3 0
EnvE3 0 0 4
68 / 95

What is X'X?

  • \mathbf{X'} is the 3 x 10 matrix

  • \mathbf{X} is the 10 x 3 matrix

  • \mathbf{X'X} is the 3 x 3 matrix

EnvE1 EnvE2 EnvE3
EnvE1 3 0 0
EnvE2 0 3 0
EnvE3 0 0 4
  • \mathbf{X'X} is the 3 x 3 diagonal matrix counting the number of phenotypes observed in each environment
69 / 95

What is X'Z?

  • \mathbf{X'} is the 3 x 10 matrix

  • \mathbf{Z} is the 10 x 4 matrix

  • \mathbf{X'Z} is the 3 x 4 matrix

GenG1 GenG2 GenG3 GenG4
EnvE1 1 1 0 1
EnvE2 1 1 1 0
EnvE3 1 1 1 1
70 / 95

What is X'Z?

  • \mathbf{X'} is the 3 x 10 matrix

  • \mathbf{Z} is the 10 x 4 matrix

  • \mathbf{X'Z} is the 3 x 4 matrix

GenG1 GenG2 GenG3 GenG4
EnvE1 1 1 0 1
EnvE2 1 1 1 0
EnvE3 1 1 1 1
  • \mathbf{X'Z} is the 3 x 4 matrix counting the number of each genotype in each environment
71 / 95

What is Z'X?

  • \mathbf{Z'} is the 4 x 10 matrix

  • \mathbf{X} is the 10 x 3 matrix

  • \mathbf{X'Z} is the 4 x 3 matrix

EnvE1 EnvE2 EnvE3
GenG1 1 1 1
GenG2 1 1 1
GenG3 0 1 1
GenG4 1 0 1
  • \mathbf{Z'X} is the 4 x 3 matrix counting the number of each genotype in each environment
72 / 95

What is Z'Z?

  • \mathbf{Z'} is the 4 x 10 matrix

  • \mathbf{Z} is the 4 x 10 matrix

  • \mathbf{Z'Z} is the 4 x 4 matrix

GenG1 GenG2 GenG3 GenG4
GenG1 3 0 0 0
GenG2 0 3 0 0
GenG3 0 0 2 0
GenG4 0 0 0 2
  • \mathbf{Z'Z} is the 4 x 4 diagonal matrix counting the number of phenotypes observed for each genotype
73 / 95

What is the left hand side of MME?

EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
EnvE1 3 0 0 1 1 0 1
EnvE2 0 3 0 1 1 1 0
EnvE3 0 0 4 1 1 1 1
GenG1 1 1 1 3.10 -0.06 0.00 -0.01
GenG2 1 1 1 -0.06 3.15 -0.01 -0.07
GenG3 0 1 1 0.00 -0.01 2.07 -0.01
GenG4 1 0 1 -0.01 -0.07 -0.01 2.11
  • \lambda = \sigma^2_e / \sigma^2_u = 1.64/19.61 = 0.06
74 / 95

What is the right hand side of MME?

EnvE1 156
EnvE2 150
EnvE3 211
GenG1 152
GenG2 145
GenG3 110
GenG4 110
75 / 95

The first genotype is missing phenotypes

N Phe Env Gen
1 NA E1 G1
2 51 E1 G2
3 46 E1 G3
4 58 E1 G4
5 NA E2 G1
6 46 E2 G2
7 52 E2 G3
8 54 E2 G4
9 NA E3 G1
10 48 E3 G2
11 58 E3 G3
12 52 E3 G4
76 / 95

What is X?

N EnvE1 EnvE2 EnvE3
2 1 0 0
3 1 0 0
4 1 0 0
6 0 1 0
7 0 1 0
8 0 1 0
10 0 0 1
11 0 0 1
12 0 0 1
  • Remove missing rows

  • \mathbf{X} is the 9 x 3 matrix

77 / 95

What is Z?

N GenG1 GenG2 GenG3 GenG4
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
6 0 1 0 0
7 0 0 1 0
8 0 0 0 1
10 0 1 0 0
11 0 0 1 0
12 0 0 0 1
  • Remove missing rows

  • \mathbf{Z} is the 9 x 4 matrix

78 / 95

What is X'X?

  • \mathbf{X'} is the 3 x 9 matrix

  • \mathbf{X} is the 9 x 3 matrix

  • \mathbf{X'X} is the 3 x 3 matrix

EnvE1 EnvE2 EnvE3
EnvE1 3 0 0
EnvE2 0 3 0
EnvE3 0 0 3
79 / 95

What is X'X?

  • \mathbf{X'} is the 3 x 9 matrix

  • \mathbf{X} is the 9 x 3 matrix

  • \mathbf{X'X} is the 3 x 3 matrix

EnvE1 EnvE2 EnvE3
EnvE1 3 0 0
EnvE2 0 3 0
EnvE3 0 0 3
  • \mathbf{X'X} is the 3 x 3 diagonal matrix counting the number of phenotypes observed in each environment
80 / 95

What is X'Z?

  • \mathbf{X'} is the 3 x 9 matrix

  • \mathbf{Z} is the 9 x 4 matrix

  • \mathbf{X'Z} is the 3 x 4 matrix

GenG1 GenG2 GenG3 GenG4
EnvE1 0 1 1 1
EnvE2 0 1 1 1
EnvE3 0 1 1 1
81 / 95

What is X'Z?

  • \mathbf{X'} is the 3 x 9 matrix

  • \mathbf{Z} is the 9 x 4 matrix

  • \mathbf{X'Z} is the 3 x 4 matrix

GenG1 GenG2 GenG3 GenG4
EnvE1 0 1 1 1
EnvE2 0 1 1 1
EnvE3 0 1 1 1
  • \mathbf{X'Z} is the 3 x 4 matrix counting the number of each genotype in each environment
82 / 95

What is Z'X?

  • \mathbf{Z'} is the 4 x 12 matrix

  • \mathbf{X} is the 12 x 3 matrix

  • \mathbf{X'Z} is the 4 x 3 matrix

EnvE1 EnvE2 EnvE3
GenG1 0 0 0
GenG2 1 1 1
GenG3 1 1 1
GenG4 1 1 1
  • \mathbf{Z'X} is the 4 x 3 matrix counting the number of each genotype in each environment
83 / 95

What is Z'Z?

  • \mathbf{Z'} is the 4 x 9 matrix

  • \mathbf{Z} is the 4 x 9 matrix

  • \mathbf{Z'Z} is the 4 x 4 matrix

GenG1 GenG2 GenG3 GenG4
GenG1 0 0 0 0
GenG2 0 3 0 0
GenG3 0 0 3 0
GenG4 0 0 0 3
  • \mathbf{Z'Z} is the 4 x 4 diagonal matrix counting the number of phenotypes observed for each genotype
84 / 95

What is the left hand side of MME?

EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
EnvE1 3 0 0 0 1 0 1
EnvE2 0 3 0 0 1 1 0
EnvE3 0 0 3 0 1 1 1
GenG1 0 0 0 0.14 -0.08 0.00 -0.01
GenG2 1 1 1 -0.08 3.19 -0.02 -0.09
GenG3 0 1 1 0.00 -0.02 3.09 -0.01
GenG4 1 0 1 -0.01 -0.09 -0.01 3.15
  • \lambda = \sigma^2_e / \sigma^2_u = 1.79/22.78 = 0.08
85 / 95

What is the right hand side of MME?

EnvE1 155
EnvE2 152
EnvE3 158
GenG1 0
GenG2 145
GenG3 156
GenG4 164
86 / 95

Solutions

EnvE1 52.06
EnvE2 51.06
EnvE3 53.06
GenG1 -1.82
GenG2 -3.48
GenG3 -0.07
GenG4 2.38
  • These are BLUE and BLUP of environments and genotypes, respectively
87 / 95

Dimension of left hand side

  • s: number of unique fixed effects

  • q: number of unique genotypes

  • does not depend on n

88 / 95

Extending a linear mixed model for GWAS

Previous model \mathbf{y = Xb + Zu + e}

Linear mixed model single-marker regression \mathbf{y = Xb + Wa + Zu + e}

  • \mathbf{W}: marker matrix

  • \mathbf{a}: vector of marker effect

89 / 95

Linear mixed model for GWAS

Single marker-based mixed model association (MMA) \begin{align*} \mathbf{y} &= \mu + \mathbf{w_ja_j} + \mathbf{Zg} + \boldsymbol{\epsilon} \\ \mathbf{g} &\sim N(0, \mathbf{G}\sigma^2_{g}) \end{align*}

\mathbf{G} captures population structure and polygenic effects

90 / 95

Linear mixed model for GWAS

Single marker-based mixed model association (MMA) \begin{align*} \mathbf{y} &= \mu + \mathbf{w_ja_j} + \mathbf{Zg} + \boldsymbol{\epsilon} \\ \mathbf{g} &\sim N(0, \mathbf{G}\sigma^2_{g}) \end{align*}

\mathbf{G} captures population structure and polygenic effects

Double counting?

91 / 95

Linear mixed model for GWAS

Single marker-based mixed model association (MMA) \begin{align*} \mathbf{y} &= \mu + \mathbf{w_ja_j} + \mathbf{Zg} + \boldsymbol{\epsilon} \\ \mathbf{g} &\sim N(0, \mathbf{G}\sigma^2_{g}) \end{align*}

\mathbf{G} captures population structure and polygenic effects

Double counting?

Alternatively, \begin{align*} \mathbf{y} &= \mu + \mathbf{w_ja_j} + \mathbf{Zg} + \boldsymbol{\epsilon} \\ \mathbf{g} &\sim N(0, \mathbf{G}_{-k}\sigma^2_{g_{-k}}) \end{align*} where -k denotes the kth chromosome removed

See

92 / 95

How to solve the linear mixed model?

1: Mixed model equations (MME) \begin{align*} \mathbf{y} &= \mu + \mathbf{w_ja_j} + \mathbf{Zg} + \boldsymbol{\epsilon} \\ \end{align*}

The mixed model equations of Henderson (1949) are given by

2: Weighted least squares \begin{align*} \hat{\mathbf{a}} &= (\mathbf{W'U T U'W})^{-1}\mathbf{W'U} \mathbf{T} \mathbf{U'y} \end{align*} where

  • \mathbf{U}: eigenvectors of the \mathbf{G} matrix
  • \mathbf{D}: eigenvalues of the \mathbf{G} matrix

\begin{align*} \mathbf{T} = [\mathbf{D} + \lambda \mathbf{I} ]^{-1} \end{align*}

93 / 95

Important literature

Animal

  • Kennedy et al. 1992. Estimation of effects of single genes on quantitative traits. J Anim Sci. 70:2000-2012.

Plant

  • Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38:203-208.
94 / 95

Bibliography

95 / 95

About me

2 / 95
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow