Multiple testing correction in GWAS

Gota Morota

April 9, 2020

Data

This example illustrates how to perform multiple testing correction in GWAS using 1) the Bonferroni correction, 2) the Šidák correction, and 3) the Li and Ji correction.

We will use the subset of the mice data in the BGLR R package.

## [1] 100 300

Bonferroni correction

The genome-wide statistical significance threshold using the Bonferroni correction is \[\begin{align*} \alpha_{bonferroni} &= \alpha / m \\ \end{align*}\] where \(\alpha\) is the non-adjusted statistical significance threshold and \(m\) is the number of markers. If we set \(\alpha = 0.05\),

## [1] 0.0001666667

Šidák correction

The genome-wide statistical significance threshold using the Šidák correction is \[\begin{align*} \alpha_{sid} &= 1 - (1 - \alpha)^{1/m} \end{align*}\] If we set \(\alpha = 0.05\),

## [1] 0.000170963

We can see that the Bonferroni and the Šidák corrections produced the similar result.

Li and Ji (2005)

The multiple testing correction of Li and Ji (2005) is implemented in the poolr R package. Also, visit its GitHub and GitHub pages.

## Loading required package: Matrix

The function meff() with method = "liji" performs the Li and Ji correction. The input required is the correlation matrix of markers.

## [1] 48

The effective number of markers according to Li and Ji (2005) is \(Meff = 48\). Once Meff is obtained, we apply the Šidák correction but replace \(m\) with \(Meff\).

\[\begin{align*} \alpha_{liji} &= 1 - (1 - \alpha)^{1/Meff} \end{align*}\]

## [1] 0.00106804

We can see that the Li and Ji correction yields the less stringent genome-wide statistical significance threshold compared to those of the Bonferroni and the Šidák.

Li and Ji (2005) for high-dimensional data

When the dimension of marker matrix is big, consider calculating the chromosome-specific Meff and then sum them up to obtain the genome-wide Meff. Suppose geno contains 400,000 markers of the rice data (12 chromosomes).