In [66]:
# Install Julia from https://julialang.org/

# Install JWAS from https://reworkhow.github.io/JWAS.jl/latest/
In [67]:
# Single-trait analysis - https://github.com/reworkhow/JWAS.jl/wiki/Single-Trait-Analysis
In [68]:
# Step 1: Load packages
In [69]:
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets
In [70]:
# Step 2: Read data
In [71]:
phenofile  = dataset("phenotypes.csv")
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")
Out[71]:
"/Users/gota/.julia/packages/JWAS/6mjQ8/src/4.Datasets/src/../data/genotypes.csv"
In [72]:
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");
The delimiter in pedigree.csv is ','.
Pedigree information:
#individuals: 100
#sires:       39
#dams:        38
#founders:    20
The delimiterd in genotypes.csv is ','. The header (marker IDs) is provided in genotypes.csv.
Genotype informatin:
#markers: 1000; #individuals: 60
In [73]:
first(phenotypes,5)
Out[73]:

5 rows × 11 columns (omitted printing of 2 columns)

IDy1y2y3x1x2x3dambv1
StringFloat64Float64Float64Float64StringStringString?Float64
1a1-0.579682-0.05820891.04560.77g1mmissing-1.07018
2a2-2.02246-2.3843-1.57196-1.02g2mmissing-1.03328
3a3-1.48071-0.421258-0.2722450.52g1fmissing-1.2916
4a4-3.03031-2.0634-0.604634-1.05g4mmissing-1.39308
5a52.188192.224251.723062.06g3mmissing0.267546
In [74]:
# Step 3: Build Model Equations
In [75]:
model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
Out[75]:
"y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
In [76]:
model = build_model(model_equation);
In [77]:
# Step 4: Set Factors or Covariates
In [78]:
set_covariate(model,"x1");
In [79]:
# Step 5: Set Random or Fixed Effects
In [80]:
set_random(model,"x2");
# set_random(model,"ID dam",pedigree);
In [81]:
# # Step 6: Run Analysis
In [82]:
out=runMCMC(model,phenotypes);
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
In this complete genomic data (non-single-step) analyis, 40 phenotyped individuals are not genotyped. These are removed from the analysis.
Predicted values for individuals of interest will be obtained as the summation of Any[] (Note that genomic data is always included for now).Phenotypes for 60 observations are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.003622



A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x2              factor       random               5
x2*x3           interaction  fixed               10
ID              factor       fixed               60
dam             factor       fixed               29

MCMC Information:

chain_length                                    100
burnin                                            0
starting_value                                 true
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):              [0.623]
residual variances:                           0.623

Genomic Information:

complete genomic data (i.e., non-single-step analysis)

Genomic Category                          genotypes
Method                                       BayesC
genetic variances (genomic):                  1.247
marker effect variances:                      0.004
π                                               0.0
estimatePi                                     true
estimateScale                                 false

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
marker effect variances:                      4.000



The file results/MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance.
The file results/MCMC_samples_marker_effects_genotypes_y1.txt is created to save MCMC samples for marker_effects_genotypes_y1.
The file results/MCMC_samples_marker_effects_variances_genotypes.txt is created to save MCMC samples for marker_effects_variances_genotypes.
The file results/MCMC_samples_pi_genotypes.txt is created to save MCMC samples for pi_genotypes.
The file results/MCMC_samples_y1.x2_variances.txt is created to save MCMC samples for y1:x2_variances.
The file results/MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1.
The file results/MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance.
The file results/MCMC_samples_heritability.txt is created to save MCMC samples for heritability.
running MCMC ...100%|███████████████████████████████████| Time: 0:00:00

The version of Julia and Platform in use:

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.


In [83]:
# Check the results
In [84]:
out
Out[84]:
Dict{Any, Any} with 7 entries:
  "EBV_y1"                   => 60×3 DataFrame…
  "heritability"             => 1×3 DataFrame…
  "location parameters"      => 106×5 DataFrame…
  "residual variance"        => 1×3 DataFrame…
  "pi_genotypes"             => 1×3 DataFrame…
  "genetic_variance"         => 1×3 DataFrame…
  "marker effects genotypes" => 1000×5 DataFrame…
In [85]:
keys(out)
Out[85]:
KeySet for a Dict{Any, Any} with 7 entries. Keys:
  "EBV_y1"
  "heritability"
  "location parameters"
  "residual variance"
  "pi_genotypes"
  "genetic_variance"
  "marker effects genotypes"
In [86]:
out["heritability"]
Out[86]:

1 rows × 3 columns

CovarianceEstimateSD
AnyAnyAny
1y10.648410.141073
In [87]:
out["residual variance"]
Out[87]:

1 rows × 3 columns

CovarianceEstimateSD
AnyAnyAny
1y1_y10.4149340.283966
In [88]:
out["marker effects genotypes"]
Out[88]:

1,000 rows × 5 columns

TraitMarker_IDEstimateSDModel_Frequency
AnyAnyAnyAnyAny
1y1m1-0.004063940.03648310.8
2y1m20.006340280.04698960.84
3y1m3-0.008253310.04364170.81
4y1m4-0.005313350.0426850.87
5y1m50.003256270.04590290.83
6y1m6-0.004930910.04112990.83
7y1m70.00824310.04353720.81
8y1m8-0.001290830.04568030.79
9y1m9-0.01038260.04091480.87
10y1m100.004375310.0439690.83
11y1m110.004331190.04424280.78
12y1m12-0.0009376530.04020710.8
13y1m13-0.005450290.04431240.84
14y1m140.003281230.04591520.9
15y1m15-0.006966940.04567810.85
16y1m165.13691e-50.04196890.83
17y1m170.00306850.04371310.83
18y1m180.005252230.04518650.83
19y1m19-0.008288770.04519340.79
20y1m200.00784350.04680860.85
21y1m210.001744290.04473790.87
22y1m22-0.007310890.04192740.81
23y1m23-0.014030.04232160.91
24y1m240.003640110.04904320.88
25y1m250.005946730.0543510.8
26y1m260.003592380.04269030.79
27y1m27-0.001752650.0426880.82
28y1m28-0.001658770.04196330.77
29y1m29-0.004368270.04280240.8
30y1m300.005067340.03983620.76
⋮⋮⋮⋮⋮⋮
In [89]:
# Bayesian marker effect SEM-GWAS - https://github.com/reworkhow/JWAS.jl/wiki/Integrating-Phenotypic-Causal-Networks-in-GWAS
In [90]:
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets
In [91]:
# Step 2: Read data 
phenofile  = dataset("phenotypes.csv")
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");
first(phenotypes,5)
The delimiter in pedigree.csv is ','.
Pedigree information:
#individuals: 100
#sires:       39
#dams:        38
#founders:    20
The delimiterd in genotypes.csv is ','. The header (marker IDs) is provided in genotypes.csv.
Genotype informatin:
#markers: 1000; #individuals: 60
Out[91]:

5 rows × 11 columns (omitted printing of 2 columns)

IDy1y2y3x1x2x3dambv1
StringFloat64Float64Float64Float64StringStringString?Float64
1a1-0.579682-0.05820891.04560.77g1mmissing-1.07018
2a2-2.02246-2.3843-1.57196-1.02g2mmissing-1.03328
3a3-1.48071-0.421258-0.2722450.52g1fmissing-1.2916
4a4-3.03031-2.0634-0.604634-1.05g4mmissing-1.39308
5a52.188192.224251.723062.06g3mmissing0.267546
In [92]:
# Step 3: Build Model Equations

model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
                  y2 = intercept + x1 + x2 + ID + genotypes
                  y3 = intercept + x1 + ID + genotypes";
model = build_model(model_equation);
In [93]:
# Step 4: Set Factors or Covariates
set_covariate(model,"x1");
In [94]:
# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);
x2 is not found in model equation 3.
dam is not found in model equation 2.
dam is not found in model equation 3.
In [95]:
# Step 6: Run Analysis
# If `causal_structure` is provided, e.g., causal_structure = [0.0 0.0 0.0;1.0 0.0 0.0;1.0 0.0 0.0] (column index affects row index, and a lower triangular matrix is required) for
# trait 1 -> trait 2 and trait 1 -> trait 3, phenotypic causal networks will be incorporated using structure equation models.
my_structure = [0.0 0.0 0.0
                1.0 0.0 0.0
                1.0 0.0 0.0]
out=runMCMC(model,phenotypes,causal_structure=my_structure);
The folder results already exists.
The folder results1 is created to save results.
Checking pedigree...
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
In this complete genomic data (non-single-step) analyis, 40 phenotyped individuals are not genotyped. These are removed from the analysis.
Predicted values for individuals of interest will be obtained as the summation of Any["y1:ID", "y2:ID", "y3:ID", "y1:dam"] (Note that genomic data is always included for now).Phenotypes for 60 observations are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.

Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.

The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π.
The mean of the prior for the marker effects covariance matrix is:
 0.001811  0.0       0.0
 0.0       0.001849  0.0
 0.0       0.0       0.001429



A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
y2 = intercept + x1 + x2 + ID + genotypes
y3 = intercept + x1 + ID + genotypes

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x2              factor       random               5
x2*x3           interaction  fixed               10
ID              factor       random             100
dam             factor       random             100

MCMC Information:

chain_length                                    100
burnin                                            0
starting_value                                 true
printout_frequency                              101
output_samples_frequency                          1
constraint                                     true
missing_phenotypes                            false
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2,y2:x2):
 0.623  0.0
 0.0    0.637
random effect variances (y1:ID,y2:ID,y3:ID,y1:dam):
 0.623  0.0    0.0    0.0
 0.0    0.637  0.0    0.0
 0.0    0.0    0.492  0.0
 0.0    0.0    0.0    0.623
genetic variances (polygenic):
 0.623  0.0    0.0    0.0
 0.0    0.637  0.0    0.0
 0.0    0.0    0.492  0.0
 0.0    0.0    0.0    0.623
residual variances:           
 0.623  0.0    0.0
 0.0    0.637  0.0
 0.0    0.0    0.492

Genomic Information:

complete genomic data (i.e., non-single-step analysis)

Genomic Category                          genotypes
Method                                       BayesC
genetic variances (genomic):  
 0.623  0.0    0.0
 0.0    0.637  0.0
 0.0    0.0    0.492
marker effect variances:      
 0.002  0.0    0.0
 0.0    0.002  0.0
 0.0    0.0    0.001

Π: (Y(yes):included; N(no):excluded)

["y1", "y2", "y3"]         probability
["Y", "Y", "Y"]                 1.0
["N", "N", "N"]                 0.0
["N", "N", "Y"]                 0.0
["Y", "Y", "N"]                 0.0
["N", "Y", "Y"]                 0.0
["Y", "N", "N"]                 0.0
["Y", "N", "Y"]                 0.0
["N", "Y", "N"]                 0.0

estimatePi                                     true
estimateScale                                 false

Degree of freedom for hyper-parameters:

residual variances:                           7.000
random effect variances:                      6.000
polygenic effect variances:                   8.000
marker effect variances:                      4.000



The file results1/MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance.
The file results1/MCMC_samples_polygenic_effects_variance.txt is created to save MCMC samples for polygenic_effects_variance.
The file results1/MCMC_samples_marker_effects_genotypes_y1.txt is created to save MCMC samples for marker_effects_genotypes_y1.
The file results1/MCMC_samples_marker_effects_genotypes_y2.txt is created to save MCMC samples for marker_effects_genotypes_y2.
The file results1/MCMC_samples_marker_effects_genotypes_y3.txt is created to save MCMC samples for marker_effects_genotypes_y3.
The file results1/MCMC_samples_marker_effects_variances_genotypes.txt is created to save MCMC samples for marker_effects_variances_genotypes.
The file results1/MCMC_samples_pi_genotypes.txt is created to save MCMC samples for pi_genotypes.
The file results1/MCMC_samples_y1.x2_y2.x2_variances.txt is created to save MCMC samples for y1:x2_y2:x2_variances.
The file results1/MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt is created to save MCMC samples for y1:ID_y2:ID_y3:ID_y1:dam_variances.
The file results1/MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1.
The file results1/MCMC_samples_EBV_y2.txt is created to save MCMC samples for EBV_y2.
The file results1/MCMC_samples_EBV_y3.txt is created to save MCMC samples for EBV_y3.
The file results1/MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance.
The file results1/MCMC_samples_heritability.txt is created to save MCMC samples for heritability.
running MCMC ...100%|███████████████████████████████████| Time: 0:00:02

The version of Julia and Platform in use:

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.


Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
In [96]:
# GWAS of Direct Marker Effects on Trait y2

#Compute the model frequency for each marker (the probability the marker is included in the model).
marker_effects_file="results1/MCMC_samples_marker_effects_genotypes_y2.txt"
out = GWAS(marker_effects_file,header=true)
Compute the model frequency for each marker (the probability the marker is included in the model).
Out[96]:

1,000 rows × 2 columns

marker_IDmodelfrequency
Abstrac…Float64
1m10.91
2m20.94
3m30.88
4m40.96
5m50.94
6m61.0
7m70.98
8m80.96
9m90.8
10m100.96
11m110.95
12m120.99
13m130.97
14m140.89
15m150.96
16m160.81
17m170.99
18m180.96
19m190.96
20m200.98
21m211.0
22m220.96
23m231.0
24m240.93
25m250.96
26m260.94
27m270.85
28m280.96
29m290.98
30m300.96
⋮⋮⋮
In [97]:
# GWAS of Indirect Marker Effects on Trait y2

#Compute the model frequency for each marker (the probability the marker is included in the model).
marker_effects_file="results1/MCMC_samples_indirect_marker_effects_genotypes_y2.txt"
out=GWAS(marker_effects_file,header=true)
Compute the model frequency for each marker (the probability the marker is included in the model).
Out[97]:

1,000 rows × 2 columns

marker_IDmodelfrequency
Abstrac…Float64
1m10.92
2 m20.92
3 m30.95
4 m41.0
5 m50.91
6 m60.96
7 m70.99
8 m80.98
9 m90.9
10 m100.99
11 m110.94
12 m120.98
13 m130.95
14 m140.91
15 m150.94
16 m160.88
17 m170.99
18 m180.99
19 m190.96
20 m200.99
21 m210.95
22 m220.96
23 m231.0
24 m240.92
25 m250.94
26 m260.96
27 m270.93
28 m280.95
29 m290.98
30 m300.94
⋮⋮⋮
In [ ]: