SEM-GWAS

Haipeng Yu
https://haipengu.github.io/
Mehdi Momen
https://mehdimomen.github.io/
Gota Morota
http://morotalab.org/

2019/11/25

Introduction

Multiple trait analysis is common in a genome-wide association study (GWAS) by leveraging genetic correlations among phenotypes. While a multi-trait genome-wide association study (MTM-GWAS) is a powerful approach, it fails to distinguish the causal relationships among phenotypes. Use of a structural equation model allows us to incorporate trait network structures into the MTM-GWAS framework. This tutorial inllustrates the application of structural equation model GWAS (SEM-GWAS) using shoot biomass (PSA), root biomass (RB), water use (WU), and water use efficiency (WUE) in rice(Momen et al. 2019).

SEM-GWAS modeling

In this tutorial, we will fit a single marker SEM-GWAS using WOMBAT. WOMBAT requires the followings: 1) a network structure between traits; 2) phenotype file; 3) marker genotype file; 4) relationship matrix (e.g., genomic relatinship matrix); and 5) WOMBAT parameter file.

Network structure between traits

You can specify the network structure between traits by using biological knowledge/prior or estimating from the data. We will assume the following network structure in this tutorial.

Phenotype file (Pheno.dat)

The phenotype file includes columns of trait number, individual ID, SNP marker, causal covariates (phenotypes), and phenotypic records. Genotypes are ordered within four traits (columns 1 , 2, and 6) and two causal covariates (i.e., PSA and RB) are included in columns 4 and 5. The third column is treated as a place holder, which is randomly sampled from a binomial distribution.

Genotype file (QTLAllels.dat)

The marker matrix is ordered as makers by individuals, and the file name need to be QTLAllels.dat.

Relationship matrix (animal.gin)

Only the upper triangular (including diagonal) elements will be needed in this file. The determinant (optional) of the relationship matrix is provided in the first row.

Parameter file

The parameter file of WOMBAT is shown below. In the second row, the option --snap (Meyer and Tier 2012) is specified to fit a single marker GWAS. In the model section, the causal covariates (i.e., PSA and RB) are explicitly assigned to WU and WUE according to the trait network structure. Check this page for additional information.

Run SEM-GWAS using Wombat

./wombat -cv --snap parameterfile.par

References

Meyer, Karin, and Bruce Tier. 2012. “‘SNP Snappy’: A Strategy for Fast Genome-Wide Association Studies Fitting a Full Mixed Model.” Genetics 190 (1). Genetics Soc America: 275–77.

Momen, Mehdi, Malachy T Campbell, Harkamal Walia, and Gota Morota. 2019. “Utilizing Trait Networks and Structural Equation Models as Tools to Interpret Multi-Trait Genome-Wide Association Studies.” Plant Methods 15 (1). Springer: 107.