Introduction
A Bayesian network (BN) describes the joint probability distribution of random variables through their conditional dependencies/independencies and represents as a graphical model (directed acyclic graph)(Scutari and Denis 2014). Thus, a BN provides an approach to decipher the probabilistic dependencies among variables of interest and has been applied in plant and animal genetics (G. Morota et al. 2012; Yu et al. 2019).
In the following examples, we use the bnlearn
package (Scutari 2010) to infer the BN of seven rice phenotypes including Na+, K+, and Na+: K+ content in roots and shoots (Yu et al. 2019). The rice genotypes are from the Rice Diversity Panel 1 (Zhao et al. 2011). We will use two different approaches to infer BN: 1) data-driven approach and 2) the combination of the data-driven approach and prior knowledge. You can download the rice dataset here.
Load R packages
# install.packages("bnlearn")
library(bnlearn)
# install.packages("cowplot")
library(cowplot)
# install.packages("ggplot2")
library(ggplot2)
# install.packages("gaston")
library(gaston)
# install.packages("BiocManager")
# BiocManager::install("Rgraphviz")
library(Rgraphviz)
# install.packages("devtools")
library(devtools)
# install_github("QuantGen/MTM")
library(MTM)
Data import
rm(list = ls())
# Import phenotypes
pheno <- read.csv("salt_phenotypes.csv", row.names = 1)
dim(pheno) # 385 7
## [1] 385 7
pheno <- na.omit(pheno) # remove NAs
dim(pheno) # 366 7
## [1] 366 7
head(pheno)
## Na.K.Shoot Na.Shoot K.Shoot.Salt K.Shoot.Control Na.K.Root
## NSFTV_1 217.9371 0.1949661 0.8960666 0.9117648 500.9474
## NSFTV_10 137.4868 0.1250447 0.9109011 0.9235181 427.0097
## NSFTV_100 165.0296 0.1667733 1.0206480 1.0795380 454.0931
## NSFTV_101 146.2632 0.1574553 1.1010989 1.0868638 466.6072
## NSFTV_102 187.6632 0.1537053 0.8338489 0.9042839 522.5943
## NSFTV_103 119.3382 0.1102053 0.9420989 1.0959358 383.7196
## Na.Root K.Root.Salt
## NSFTV_1 0.2933769 0.6045812
## NSFTV_10 0.2970459 0.7066461
## NSFTV_100 0.3127060 0.6728234
## NSFTV_101 0.3068186 0.6662068
## NSFTV_102 0.4205350 0.8076836
## NSFTV_103 0.2723217 0.7132342
# Import genotype
geno <- read.bed.matrix("sativas413")
## Reading sativas413.fam
## Reading sativas413.bim
## Reading sativas413.bed
## ped stats and snps stats have been set.
## 'p' has been set.
## 'mu' and 'sigma' have been set.
dim(geno) # 413 36901
## [1] 413 36901
## Subset accesions with both genotype and phenotype records
geno@ped$id <- paste0("NSFTV_", geno@ped$id)
geno <- geno[geno@ped$id %in% row.names(pheno), ]
dim(geno) # 363 36901
## [1] 363 36901
pheno <- pheno[match(geno@ped$id, row.names(pheno)), ]
dim(pheno) # 363 7
## [1] 363 7
## Check if the ids of genotypes and phenotypes are identical
table(geno@ped$id == rownames(pheno))
##
## TRUE
## 363
Construct a genomic relationship matrix
# Quality control of genotypes
geno <- select.snps(geno, c(maf > 0.05 & callrate > 0.9))
dim(geno) # 363 29499
## [1] 363 29499
# Construct a genomic relationship matrix
grm <- GRM(geno)
dim(grm) # 363 363
## [1] 363 363
Fit a multivariate mixed model to obtain estimated breeding values (EBV)
# Define the MTM_func using the MTM Package
MTM_func <- function(Y, G, nTrait, nIter, burnIn, thin, prefix) {
library(MTM)
set.seed(007) # reproducible results
MTM ( Y = Y, K = list ( list ( K = G, COV = list ( type = 'UN', df0 = nTrait,
S0 = diag (nTrait) ) ) ), resCov = list ( type = 'UN', S0 = diag (nTrait),
df0 = nTrait), nIter = nIter, burnIn = burnIn, thin = thin, saveAt = prefix)
}
Y <- scale(pheno, center = TRUE, scale = TRUE)
G <- grm
nTrait <- ncol(Y)
MTM_fit <- MTM_func(Y = Y, G = G, nTrait = nTrait, nIter = 1200, burnIn = 200, thin = 5, prefix = 'MTM_fit')
## Check Gibbs samples
list.files(pattern = 'MTM_fit')
## Retrieve estimates
str(MTM_fit)
BV <- MTM_fit$K[[1]]$U # estimated breeding values
dim(BV) # 363 7
head(BV)
colnames(BV) <- colnames(pheno)
BV <- as.data.frame(BV)
head(BV)
Structure learning using EBV
## Check strength and direction in BN
check_boot <- function(boot, Strength_Thres, Direction_Thres) {
boot[(boot$strength >= Strength_Thres) & (boot$direction > Direction_Thres),]
}
Structure learning: data-driven approach
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV)
graphviz.plot(tabu_simple, main = "Tabu_data-driven", shape = "ellipse",
layout = "dot")
# Model Averaging with 500 bootstrap samples to account for uncertainty
set.seed(007)
boot_tabu <- boot.strength (BV, algorithm = "tabu", R = 500)
# Check strength (>= 0.8) and direction (> 0.5)
check_boot(boot_tabu, 0.8, 0.5)
## from to strength direction
## 1 Na.K.Shoot Na.Shoot 1.000 0.9470000
## 2 Na.K.Shoot K.Shoot.Salt 1.000 0.5360000
## 12 Na.Shoot K.Root.Salt 0.896 0.8649554
## 14 K.Shoot.Salt Na.Shoot 0.976 0.9303279
## 15 K.Shoot.Salt K.Shoot.Control 1.000 0.9420000
## 28 Na.K.Root K.Shoot.Control 0.946 0.9704017
## 29 Na.K.Root Na.Root 1.000 0.9960000
## 30 Na.K.Root K.Root.Salt 1.000 0.9960000
## 40 K.Root.Salt K.Shoot.Control 0.988 0.7419028
## 42 K.Root.Salt Na.Root 1.000 0.7510000
# Average bootstrap samples using threshold = 0.8
ave_model_tabu <- averaged.network(boot_tabu, threshold = 0.8)
graphviz.plot(ave_model_tabu, shape = 'ellipse')
Structure learning: data + prior knowledge
If prior knowledge regarding the network structure is available, it is possible to incorporate this into the process of structure learning. We can add and eliminate the edge using arguments of whitelist
and blacklist
, accordingly.
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV)
graphviz.plot(tabu_simple, main = "Tabu", shape = "ellipse", layout = "dot")
# Incorporate subjective prior
whtlist <- c(from = 'Na.K.Root', to = 'Na.K.Shoot')
blklist <- data.frame(from = c("K.Shoot.Salt", "K.Shoot.Control" ),
to = c("K.Shoot.Control", "K.Shoot.Salt"))
# Data + subjective prior
tabu_blk <- tabu(BV, whitelist = whtlist, blacklist = blklist)
graphviz.plot(tabu_blk, main = "Tabu_data-and-prior", shape = "ellipse",
layout = "dot")
Remove the dependence in EBV with Cholesky Decomposition
Theoretically, BN learning algorithms assume sample independence. However, the genomic relationship matrix introduces dependencies among EBV, which may act as a confounder. We can transform EBV to obtain adjusted EBV by eliminating the dependencies. We can decompose \(\mathbf{G}\) into its Cholesky factors \(\mathbf{G} = \mathbf{L} \mathbf{L'}\). Here, \(\mathbf{L}\) is an \(n \times n\) lower triangular matrix. For a single trait we can remove the dependancy from the BV \(\mathbf{u}\) and yield the adjusted breeding values \(\mathbf{u^*}\) by \(\mathbf{u^*} = \mathbf{L^{-1}} \mathbf{u}\). When we have \(t\) traits, the dimension of \(\mathbf{u}\) becomes \((n \times t) \times 1\) and \(\mathbf{u^*} = \mathbf{M^{-1}} \mathbf{u}\), where \(\mathbf{M^{-1}} = \mathbf{I_{(n \times t) \times (n \times t)}} \otimes \mathbf{L^{-1}}\).
Linv <- solve(t(chol(grm)))
Minv <- kronecker(diag(7), Linv)
BV_adj <- matrix(Minv %*% c(as.matrix(BV)), nrow = 363, ncol = 7)
colnames(BV_adj) <- names(pheno)
BV_adj <- as.data.frame(BV_adj)
Structure learning: data-driven approach
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV_adj)
graphviz.plot(tabu_simple, main = "Tabu", shape = "ellipse", layout = "dot")
# Model Averaging with 500 bootstrap samples to account for uncertainty
set.seed(007)
boot_tabu <- boot.strength (BV_adj, algorithm = "tabu", R = 500)
# Check strength (>= 0.8) and direction (> 0.5)
check_boot(boot_tabu, 0.8, 0.5)
## from to strength direction
## 1 Na.K.Shoot Na.Shoot 1.000 0.8300000
## 2 Na.K.Shoot K.Shoot.Salt 1.000 0.6400000
## 6 Na.K.Shoot K.Root.Salt 0.950 0.5105263
## 14 K.Shoot.Salt Na.Shoot 0.954 0.7442348
## 15 K.Shoot.Salt K.Shoot.Control 1.000 0.9790000
## 35 Na.Root Na.K.Root 1.000 0.6970000
## 40 K.Root.Salt K.Shoot.Control 0.956 0.9895397
## 41 K.Root.Salt Na.K.Root 1.000 0.7730000
## 42 K.Root.Salt Na.Root 1.000 0.6320000
# Average bootstrap samples use threshold = 0.8
ave_model_tabu <- averaged.network(boot_tabu, threshold = 0.8)
graphviz.plot(ave_model_tabu, shape = 'ellipse')
Structure learning: data + prior knowledge
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV_adj)
graphviz.plot(tabu_simple, main = "Tabu", shape = "ellipse", layout = "dot")
# Incorporate subjective prior
whtlist <- c(from = 'Na.K.Root', to = 'Na.K.Shoot')
blklist <- data.frame(from = c("K.Shoot.Salt", "K.Shoot.Control" ),
to = c("K.Shoot.Control", "K.Shoot.Salt"))
# Data + subjective prior
tabu_blk <- tabu(BV_adj, whitelist = whtlist, blacklist = blklist)
graphviz.plot(tabu_blk, main = "Tabu_data-and-prior", shape = "ellipse",
layout = "dot")
Available BN learning algorithms in the bnlearn
package
In the above examples, we used the score-based algorithm Tabu Search for structure learning. Alternatively, you can use other algorithms by replacing the tabu()
function and the algorithm = "tabu"
argument in the boot.strength()
function with the followings. The details can be found in the help page of bnlearn.
- Score-based structure learning algorithms:
- hc: Hill Climbing (HC)
- tabu: Tabu Search (Tabu)
- Constraint-based structure learning algorithms:
- pc.stable : the first practical application of the IC algorithm
- gs: Grow-Shrink (GS)
- iamb: Incremental Association Markov Blanket (IAMB)
- fast.iamb: Fast Incremental Association (Fast-IAMB)
- inter.iamb: Interleaved Incremental Association (Inter-IAMB)
- mmpc: Max-Min Parents & Children (MMPC)
- si.hiton.pc: Semi-Interleaved Hiton-PC (SI-HITON-PC)
- Hybrid structure learning algorithms:
- mmhc: Max-Min Hill Climbing (MMHC)
- rsmax2: General 2-Phase Restricted Maximization (RSMAX2)
References
Morota, G, BD Valente, GJM Rosa, KA Weigel, and D Gianola. 2012. “An Assessment of Linkage Disequilibrium in Holstein Cattle Using a Bayesian Network.” Journal of Animal Breeding and Genetics 129 (6). Wiley Online Library: 474–87.
Scutari, Marco. 2010. “Learning Bayesian Networks with the Bnlearn R Package.” Journal of Statistical Software, Articles 35 (3): 1–22. doi:10.18637/jss.v035.i03.
Scutari, Marco, and Jean-Baptiste Denis. 2014. Bayesian Networks: With Examples in R. Chapman; Hall/CRC.
Yu, Haipeng, Malachy T Campbell, Qi Zhang, Harkamal Walia, and Gota Morota. 2019. “Genomic Bayesian Confirmatory Factor Analysis and Bayesian Network to Characterize a Wide Spectrum of Rice Phenotypes.” G3: Genes, Genomes, Genetics. G3: Genes, Genomes, Genetics, g3–400154.
Zhao, Keyan, Chih-Wei Tung, Georgia C Eizenga, Mark H Wright, M Liakat Ali, Adam H Price, Gareth J Norton, et al. 2011. “Genome-Wide Association Mapping Reveals a Rich Genetic Architecture of Complex Traits in Oryza Sativa.” Nature Communications 2. Nature Publishing Group: 467.