# Install Julia from https://julialang.org/
# Install JWAS from https://reworkhow.github.io/JWAS.jl/latest/
# Single-trait analysis - https://github.com/reworkhow/JWAS.jl/wiki/Single-Trait-Analysis
# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets
# Step 2: Read data
phenofile = dataset("phenotypes.csv")
pedfile = dataset("pedigree.csv")
genofile = dataset("genotypes.csv")
"/Users/gota/.julia/packages/JWAS/6mjQ8/src/4.Datasets/src/../data/genotypes.csv"
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree = get_pedigree(pedfile,separator=",",header=true);
genotypes = get_genotypes(genofile,separator=',',method="BayesC");
The delimiter in pedigree.csv is ','. Pedigree information: #individuals: 100 #sires: 39 #dams: 38 #founders: 20 The delimiterd in genotypes.csv is ','. The header (marker IDs) is provided in genotypes.csv. Genotype informatin: #markers: 1000; #individuals: 60
first(phenotypes,5)
ID | y1 | y2 | y3 | x1 | x2 | x3 | dam | bv1 | |
---|---|---|---|---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | String | String | String? | Float64 | |
1 | a1 | -0.579682 | -0.0582089 | 1.0456 | 0.77 | g1 | m | missing | -1.07018 |
2 | a2 | -2.02246 | -2.3843 | -1.57196 | -1.02 | g2 | m | missing | -1.03328 |
3 | a3 | -1.48071 | -0.421258 | -0.272245 | 0.52 | g1 | f | missing | -1.2916 |
4 | a4 | -3.03031 | -2.0634 | -0.604634 | -1.05 | g4 | m | missing | -1.39308 |
5 | a5 | 2.18819 | 2.22425 | 1.72306 | 2.06 | g3 | m | missing | 0.267546 |
# Step 3: Build Model Equations
model_equation ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
"y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
model = build_model(model_equation);
# Step 4: Set Factors or Covariates
set_covariate(model,"x1");
# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
# set_random(model,"ID dam",pedigree);
# # Step 6: Run Analysis
out=runMCMC(model,phenotypes);
The folder results is created to save results. Checking genotypes... Checking phenotypes... Individual IDs (strings) are provided in the first column of the phenotypic data. In this complete genomic data (non-single-step) analyis, 40 phenotyped individuals are not genotyped. These are removed from the analysis. Predicted values for individuals of interest will be obtained as the summation of Any[] (Note that genomic data is always included for now).Phenotypes for 60 observations are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt. Prior information for genomic variance is not provided and is generated from the data. Prior information for residual variance is not provided and is generated from the data. Prior information for random effect variance is not provided and is generated from the data. The prior for marker effects variance is calculated from the genetic variance and π. The mean of the prior for the marker effects variance is: 0.003622 A Linear Mixed Model was build using model equations: y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes Model Information: Term C/F F/R nLevels intercept factor fixed 1 x1 covariate fixed 1 x2 factor random 5 x2*x3 interaction fixed 10 ID factor fixed 60 dam factor fixed 29 MCMC Information: chain_length 100 burnin 0 starting_value true printout_frequency 101 output_samples_frequency 1 constraint false missing_phenotypes true update_priors_frequency 0 seed false Hyper-parameters Information: random effect variances (y1:x2): [0.623] residual variances: 0.623 Genomic Information: complete genomic data (i.e., non-single-step analysis) Genomic Category genotypes Method BayesC genetic variances (genomic): 1.247 marker effect variances: 0.004 π 0.0 estimatePi true estimateScale false Degree of freedom for hyper-parameters: residual variances: 4.000 random effect variances: 5.000 marker effect variances: 4.000 The file results/MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance. The file results/MCMC_samples_marker_effects_genotypes_y1.txt is created to save MCMC samples for marker_effects_genotypes_y1. The file results/MCMC_samples_marker_effects_variances_genotypes.txt is created to save MCMC samples for marker_effects_variances_genotypes. The file results/MCMC_samples_pi_genotypes.txt is created to save MCMC samples for pi_genotypes. The file results/MCMC_samples_y1.x2_variances.txt is created to save MCMC samples for y1:x2_variances. The file results/MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1. The file results/MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance. The file results/MCMC_samples_heritability.txt is created to save MCMC samples for heritability.
running MCMC ...100%|███████████████████████████████████| Time: 0:00:00
The version of Julia and Platform in use: Julia Version 1.6.3 Commit ae8452a9e0 (2021-09-23 17:34 UTC) Platform Info: OS: macOS (x86_64-apple-darwin19.5.0) CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, haswell) The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.
# Check the results
out
Dict{Any, Any} with 7 entries: "EBV_y1" => 60×3 DataFrame… "heritability" => 1×3 DataFrame… "location parameters" => 106×5 DataFrame… "residual variance" => 1×3 DataFrame… "pi_genotypes" => 1×3 DataFrame… "genetic_variance" => 1×3 DataFrame… "marker effects genotypes" => 1000×5 DataFrame…
keys(out)
KeySet for a Dict{Any, Any} with 7 entries. Keys: "EBV_y1" "heritability" "location parameters" "residual variance" "pi_genotypes" "genetic_variance" "marker effects genotypes"
out["heritability"]
Covariance | Estimate | SD | |
---|---|---|---|
Any | Any | Any | |
1 | y1 | 0.64841 | 0.141073 |
out["residual variance"]
Covariance | Estimate | SD | |
---|---|---|---|
Any | Any | Any | |
1 | y1_y1 | 0.414934 | 0.283966 |
out["marker effects genotypes"]
Trait | Marker_ID | Estimate | SD | Model_Frequency | |
---|---|---|---|---|---|
Any | Any | Any | Any | Any | |
1 | y1 | m1 | -0.00406394 | 0.0364831 | 0.8 |
2 | y1 | m2 | 0.00634028 | 0.0469896 | 0.84 |
3 | y1 | m3 | -0.00825331 | 0.0436417 | 0.81 |
4 | y1 | m4 | -0.00531335 | 0.042685 | 0.87 |
5 | y1 | m5 | 0.00325627 | 0.0459029 | 0.83 |
6 | y1 | m6 | -0.00493091 | 0.0411299 | 0.83 |
7 | y1 | m7 | 0.0082431 | 0.0435372 | 0.81 |
8 | y1 | m8 | -0.00129083 | 0.0456803 | 0.79 |
9 | y1 | m9 | -0.0103826 | 0.0409148 | 0.87 |
10 | y1 | m10 | 0.00437531 | 0.043969 | 0.83 |
11 | y1 | m11 | 0.00433119 | 0.0442428 | 0.78 |
12 | y1 | m12 | -0.000937653 | 0.0402071 | 0.8 |
13 | y1 | m13 | -0.00545029 | 0.0443124 | 0.84 |
14 | y1 | m14 | 0.00328123 | 0.0459152 | 0.9 |
15 | y1 | m15 | -0.00696694 | 0.0456781 | 0.85 |
16 | y1 | m16 | 5.13691e-5 | 0.0419689 | 0.83 |
17 | y1 | m17 | 0.0030685 | 0.0437131 | 0.83 |
18 | y1 | m18 | 0.00525223 | 0.0451865 | 0.83 |
19 | y1 | m19 | -0.00828877 | 0.0451934 | 0.79 |
20 | y1 | m20 | 0.0078435 | 0.0468086 | 0.85 |
21 | y1 | m21 | 0.00174429 | 0.0447379 | 0.87 |
22 | y1 | m22 | -0.00731089 | 0.0419274 | 0.81 |
23 | y1 | m23 | -0.01403 | 0.0423216 | 0.91 |
24 | y1 | m24 | 0.00364011 | 0.0490432 | 0.88 |
25 | y1 | m25 | 0.00594673 | 0.054351 | 0.8 |
26 | y1 | m26 | 0.00359238 | 0.0426903 | 0.79 |
27 | y1 | m27 | -0.00175265 | 0.042688 | 0.82 |
28 | y1 | m28 | -0.00165877 | 0.0419633 | 0.77 |
29 | y1 | m29 | -0.00436827 | 0.0428024 | 0.8 |
30 | y1 | m30 | 0.00506734 | 0.0398362 | 0.76 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
# Bayesian marker effect SEM-GWAS - https://github.com/reworkhow/JWAS.jl/wiki/Integrating-Phenotypic-Causal-Networks-in-GWAS
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets
# Step 2: Read data
phenofile = dataset("phenotypes.csv")
pedfile = dataset("pedigree.csv")
genofile = dataset("genotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree = get_pedigree(pedfile,separator=",",header=true);
genotypes = get_genotypes(genofile,separator=',',method="BayesC");
first(phenotypes,5)
The delimiter in pedigree.csv is ','. Pedigree information: #individuals: 100 #sires: 39 #dams: 38 #founders: 20 The delimiterd in genotypes.csv is ','. The header (marker IDs) is provided in genotypes.csv. Genotype informatin: #markers: 1000; #individuals: 60
ID | y1 | y2 | y3 | x1 | x2 | x3 | dam | bv1 | |
---|---|---|---|---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | String | String | String? | Float64 | |
1 | a1 | -0.579682 | -0.0582089 | 1.0456 | 0.77 | g1 | m | missing | -1.07018 |
2 | a2 | -2.02246 | -2.3843 | -1.57196 | -1.02 | g2 | m | missing | -1.03328 |
3 | a3 | -1.48071 | -0.421258 | -0.272245 | 0.52 | g1 | f | missing | -1.2916 |
4 | a4 | -3.03031 | -2.0634 | -0.604634 | -1.05 | g4 | m | missing | -1.39308 |
5 | a5 | 2.18819 | 2.22425 | 1.72306 | 2.06 | g3 | m | missing | 0.267546 |
# Step 3: Build Model Equations
model_equation ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
y2 = intercept + x1 + x2 + ID + genotypes
y3 = intercept + x1 + ID + genotypes";
model = build_model(model_equation);
# Step 4: Set Factors or Covariates
set_covariate(model,"x1");
# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);
x2 is not found in model equation 3. dam is not found in model equation 2. dam is not found in model equation 3.
# Step 6: Run Analysis
# If `causal_structure` is provided, e.g., causal_structure = [0.0 0.0 0.0;1.0 0.0 0.0;1.0 0.0 0.0] (column index affects row index, and a lower triangular matrix is required) for
# trait 1 -> trait 2 and trait 1 -> trait 3, phenotypic causal networks will be incorporated using structure equation models.
my_structure = [0.0 0.0 0.0
1.0 0.0 0.0
1.0 0.0 0.0]
out=runMCMC(model,phenotypes,causal_structure=my_structure);
The folder results already exists. The folder results1 is created to save results. Checking pedigree... Checking genotypes... Checking phenotypes... Individual IDs (strings) are provided in the first column of the phenotypic data. In this complete genomic data (non-single-step) analyis, 40 phenotyped individuals are not genotyped. These are removed from the analysis. Predicted values for individuals of interest will be obtained as the summation of Any["y1:ID", "y2:ID", "y3:ID", "y1:dam"] (Note that genomic data is always included for now).Phenotypes for 60 observations are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt. Prior information for genomic variance is not provided and is generated from the data. Prior information for residual variance is not provided and is generated from the data. Prior information for random effect variance is not provided and is generated from the data. Prior information for random effect variance is not provided and is generated from the data. Pi (Π) is not provided. Pi (Π) is generated assuming all markers have effects on all traits. The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π. The mean of the prior for the marker effects covariance matrix is: 0.001811 0.0 0.0 0.0 0.001849 0.0 0.0 0.0 0.001429 A Linear Mixed Model was build using model equations: y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes y2 = intercept + x1 + x2 + ID + genotypes y3 = intercept + x1 + ID + genotypes Model Information: Term C/F F/R nLevels intercept factor fixed 1 x1 covariate fixed 1 x2 factor random 5 x2*x3 interaction fixed 10 ID factor random 100 dam factor random 100 MCMC Information: chain_length 100 burnin 0 starting_value true printout_frequency 101 output_samples_frequency 1 constraint true missing_phenotypes false update_priors_frequency 0 seed false Hyper-parameters Information: random effect variances (y1:x2,y2:x2): 0.623 0.0 0.0 0.637 random effect variances (y1:ID,y2:ID,y3:ID,y1:dam): 0.623 0.0 0.0 0.0 0.0 0.637 0.0 0.0 0.0 0.0 0.492 0.0 0.0 0.0 0.0 0.623 genetic variances (polygenic): 0.623 0.0 0.0 0.0 0.0 0.637 0.0 0.0 0.0 0.0 0.492 0.0 0.0 0.0 0.0 0.623 residual variances: 0.623 0.0 0.0 0.0 0.637 0.0 0.0 0.0 0.492 Genomic Information: complete genomic data (i.e., non-single-step analysis) Genomic Category genotypes Method BayesC genetic variances (genomic): 0.623 0.0 0.0 0.0 0.637 0.0 0.0 0.0 0.492 marker effect variances: 0.002 0.0 0.0 0.0 0.002 0.0 0.0 0.0 0.001 Π: (Y(yes):included; N(no):excluded) ["y1", "y2", "y3"] probability ["Y", "Y", "Y"] 1.0 ["N", "N", "N"] 0.0 ["N", "N", "Y"] 0.0 ["Y", "Y", "N"] 0.0 ["N", "Y", "Y"] 0.0 ["Y", "N", "N"] 0.0 ["Y", "N", "Y"] 0.0 ["N", "Y", "N"] 0.0 estimatePi true estimateScale false Degree of freedom for hyper-parameters: residual variances: 7.000 random effect variances: 6.000 polygenic effect variances: 8.000 marker effect variances: 4.000 The file results1/MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance. The file results1/MCMC_samples_polygenic_effects_variance.txt is created to save MCMC samples for polygenic_effects_variance. The file results1/MCMC_samples_marker_effects_genotypes_y1.txt is created to save MCMC samples for marker_effects_genotypes_y1. The file results1/MCMC_samples_marker_effects_genotypes_y2.txt is created to save MCMC samples for marker_effects_genotypes_y2. The file results1/MCMC_samples_marker_effects_genotypes_y3.txt is created to save MCMC samples for marker_effects_genotypes_y3. The file results1/MCMC_samples_marker_effects_variances_genotypes.txt is created to save MCMC samples for marker_effects_variances_genotypes. The file results1/MCMC_samples_pi_genotypes.txt is created to save MCMC samples for pi_genotypes. The file results1/MCMC_samples_y1.x2_y2.x2_variances.txt is created to save MCMC samples for y1:x2_y2:x2_variances. The file results1/MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt is created to save MCMC samples for y1:ID_y2:ID_y3:ID_y1:dam_variances. The file results1/MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1. The file results1/MCMC_samples_EBV_y2.txt is created to save MCMC samples for EBV_y2. The file results1/MCMC_samples_EBV_y3.txt is created to save MCMC samples for EBV_y3. The file results1/MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance. The file results1/MCMC_samples_heritability.txt is created to save MCMC samples for heritability.
running MCMC ...100%|███████████████████████████████████| Time: 0:00:02
The version of Julia and Platform in use: Julia Version 1.6.3 Commit ae8452a9e0 (2021-09-23 17:34 UTC) Platform Info: OS: macOS (x86_64-apple-darwin19.5.0) CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, haswell) The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files. Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model). Compute the model frequency for each marker (the probability the marker is included in the model).
# GWAS of Direct Marker Effects on Trait y2
#Compute the model frequency for each marker (the probability the marker is included in the model).
marker_effects_file="results1/MCMC_samples_marker_effects_genotypes_y2.txt"
out = GWAS(marker_effects_file,header=true)
Compute the model frequency for each marker (the probability the marker is included in the model).
marker_ID | modelfrequency | |
---|---|---|
Abstrac… | Float64 | |
1 | m1 | 0.91 |
2 | m2 | 0.94 |
3 | m3 | 0.88 |
4 | m4 | 0.96 |
5 | m5 | 0.94 |
6 | m6 | 1.0 |
7 | m7 | 0.98 |
8 | m8 | 0.96 |
9 | m9 | 0.8 |
10 | m10 | 0.96 |
11 | m11 | 0.95 |
12 | m12 | 0.99 |
13 | m13 | 0.97 |
14 | m14 | 0.89 |
15 | m15 | 0.96 |
16 | m16 | 0.81 |
17 | m17 | 0.99 |
18 | m18 | 0.96 |
19 | m19 | 0.96 |
20 | m20 | 0.98 |
21 | m21 | 1.0 |
22 | m22 | 0.96 |
23 | m23 | 1.0 |
24 | m24 | 0.93 |
25 | m25 | 0.96 |
26 | m26 | 0.94 |
27 | m27 | 0.85 |
28 | m28 | 0.96 |
29 | m29 | 0.98 |
30 | m30 | 0.96 |
⋮ | ⋮ | ⋮ |
# GWAS of Indirect Marker Effects on Trait y2
#Compute the model frequency for each marker (the probability the marker is included in the model).
marker_effects_file="results1/MCMC_samples_indirect_marker_effects_genotypes_y2.txt"
out=GWAS(marker_effects_file,header=true)
Compute the model frequency for each marker (the probability the marker is included in the model).
marker_ID | modelfrequency | |
---|---|---|
Abstrac… | Float64 | |
1 | m1 | 0.92 |
2 | m2 | 0.92 |
3 | m3 | 0.95 |
4 | m4 | 1.0 |
5 | m5 | 0.91 |
6 | m6 | 0.96 |
7 | m7 | 0.99 |
8 | m8 | 0.98 |
9 | m9 | 0.9 |
10 | m10 | 0.99 |
11 | m11 | 0.94 |
12 | m12 | 0.98 |
13 | m13 | 0.95 |
14 | m14 | 0.91 |
15 | m15 | 0.94 |
16 | m16 | 0.88 |
17 | m17 | 0.99 |
18 | m18 | 0.99 |
19 | m19 | 0.96 |
20 | m20 | 0.99 |
21 | m21 | 0.95 |
22 | m22 | 0.96 |
23 | m23 | 1.0 |
24 | m24 | 0.92 |
25 | m25 | 0.94 |
26 | m26 | 0.96 |
27 | m27 | 0.93 |
28 | m28 | 0.95 |
29 | m29 | 0.98 |
30 | m30 | 0.94 |
⋮ | ⋮ | ⋮ |