# Install Julia from https://julialang.org/

# Install JWAS from https://reworkhow.github.io/JWAS.jl/latest/


# Single-trait analysis - https://github.com/reworkhow/JWAS.jl/wiki/Single-Trait-Analysis


# Step 1: Load packages


using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets


# Step 2: Read data


phenofile  = dataset("phenotypes.csv")
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")

"/Users/gota/.julia/packages/JWAS/6mjQ8/src/4.Datasets/src/../data/genotypes.csv"


phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");

The delimiter in pedigree.csv is ','.
Pedigree information:
#individuals: 100
#sires:       39
#dams:        38
#founders:    20
The delimiterd in genotypes.csv is ','. The header (marker IDs) is provided in genotypes.csv.
Genotype informatin:
#markers: 1000; #individuals: 60


first(phenotypes,5)


# Step 3: Build Model Equations


model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"

"y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"


model = build_model(model_equation);


# Step 4: Set Factors or Covariates


set_covariate(model,"x1");


# Step 5: Set Random or Fixed Effects


set_random(model,"x2");
# set_random(model,"ID dam",pedigree);


# # Step 6: Run Analysis


out=runMCMC(model,phenotypes);

The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
In this complete genomic data (non-single-step) analyis, 40 phenotyped individuals are not genotyped. These are removed from the analysis.
Predicted values for individuals of interest will be obtained as the summation of Any[] (Note that genomic data is always included for now).Phenotypes for 60 observations are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.003622


A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x2              factor       random               5
x2*x3           interaction  fixed               10
ID              factor       fixed               60
dam             factor       fixed               29

MCMC Information:

chain_length                                    100
burnin                                            0
starting_value                                 true
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):              [0.623]
residual variances:                           0.623

Genomic Information:

complete genomic data (i.e., non-single-step analysis)

Genomic Category                          genotypes
Method                                       BayesC
genetic variances (genomic):                  1.247
marker effect variances:                      0.004
π                                               0.0
estimatePi                                     true
estimateScale                                 false

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
marker effect variances:                      4.000


The file results/MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance.
The file results/MCMC_samples_marker_effects_genotypes_y1.txt is created to save MCMC samples for marker_effects_genotypes_y1.
The file results/MCMC_samples_marker_effects_variances_genotypes.txt is created to save MCMC samples for marker_effects_variances_genotypes.
The file results/MCMC_samples_pi_genotypes.txt is created to save MCMC samples for pi_genotypes.
The file results/MCMC_samples_y1.x2_variances.txt is created to save MCMC samples for y1:x2_variances.
The file results/MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1.
The file results/MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance.
The file results/MCMC_samples_heritability.txt is created to save MCMC samples for heritability.

running MCMC ...100%|███████████████████████████████████| Time: 0:00:00


The version of Julia and Platform in use:

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.


# Check the results

out

Dict{Any, Any} with 7 entries:
  "EBV_y1"                   => 60×3 DataFrame…
  "heritability"             => 1×3 DataFrame…
  "location parameters"      => 106×5 DataFrame…
  "residual variance"        => 1×3 DataFrame…
  "pi_genotypes"             => 1×3 DataFrame…
  "genetic_variance"         => 1×3 DataFrame…
  "marker effects genotypes" => 1000×5 DataFrame…


keys(out)

KeySet for a Dict{Any, Any} with 7 entries. Keys:
  "EBV_y1"
  "heritability"
  "location parameters"
  "residual variance"
  "pi_genotypes"
  "genetic_variance"
  "marker effects genotypes"


out["heritability"]


out["residual variance"]


out["marker effects genotypes"]


# Bayesian marker effect SEM-GWAS - https://github.com/reworkhow/JWAS.jl/wiki/Integrating-Phenotypic-Causal-Networks-in-GWAS


using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets


# Step 2: Read data 
phenofile  = dataset("phenotypes.csv")
pedfile    = dataset("pedigree.csv")
genofile   = dataset("genotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");
first(phenotypes,5)

The delimiter in pedigree.csv is ','.
Pedigree information:
#individuals: 100
#sires:       39
#dams:        38
#founders:    20
The delimiterd in genotypes.csv is ','. The header (marker IDs) is provided in genotypes.csv.
Genotype informatin:
#markers: 1000; #individuals: 60


# Step 3: Build Model Equations

model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
                  y2 = intercept + x1 + x2 + ID + genotypes
                  y3 = intercept + x1 + ID + genotypes";
model = build_model(model_equation);


# Step 4: Set Factors or Covariates
set_covariate(model,"x1");


# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);

x2 is not found in model equation 3.
dam is not found in model equation 2.
dam is not found in model equation 3.


# Step 6: Run Analysis
# If `causal_structure` is provided, e.g., causal_structure = [0.0 0.0 0.0;1.0 0.0 0.0;1.0 0.0 0.0] (column index affects row index, and a lower triangular matrix is required) for
# trait 1 -> trait 2 and trait 1 -> trait 3, phenotypic causal networks will be incorporated using structure equation models.
my_structure = [0.0 0.0 0.0
                1.0 0.0 0.0
                1.0 0.0 0.0]
out=runMCMC(model,phenotypes,causal_structure=my_structure);

The folder results already exists.
The folder results1 is created to save results.
Checking pedigree...
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
In this complete genomic data (non-single-step) analyis, 40 phenotyped individuals are not genotyped. These are removed from the analysis.
Predicted values for individuals of interest will be obtained as the summation of Any["y1:ID", "y2:ID", "y3:ID", "y1:dam"] (Note that genomic data is always included for now).Phenotypes for 60 observations are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.

Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.

The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π.
The mean of the prior for the marker effects covariance matrix is:
 0.001811  0.0       0.0
 0.0       0.001849  0.0
 0.0       0.0       0.001429


A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
y2 = intercept + x1 + x2 + ID + genotypes
y3 = intercept + x1 + ID + genotypes

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x2              factor       random               5
x2*x3           interaction  fixed               10
ID              factor       random             100
dam             factor       random             100

MCMC Information:

chain_length                                    100
burnin                                            0
starting_value                                 true
printout_frequency                              101
output_samples_frequency                          1
constraint                                     true
missing_phenotypes                            false
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2,y2:x2):
 0.623  0.0
 0.0    0.637
random effect variances (y1:ID,y2:ID,y3:ID,y1:dam):
 0.623  0.0    0.0    0.0
 0.0    0.637  0.0    0.0
 0.0    0.0    0.492  0.0
 0.0    0.0    0.0    0.623
genetic variances (polygenic):
 0.623  0.0    0.0    0.0
 0.0    0.637  0.0    0.0
 0.0    0.0    0.492  0.0
 0.0    0.0    0.0    0.623
residual variances:           
 0.623  0.0    0.0
 0.0    0.637  0.0
 0.0    0.0    0.492

Genomic Information:

complete genomic data (i.e., non-single-step analysis)

Genomic Category                          genotypes
Method                                       BayesC
genetic variances (genomic):  
 0.623  0.0    0.0
 0.0    0.637  0.0
 0.0    0.0    0.492
marker effect variances:      
 0.002  0.0    0.0
 0.0    0.002  0.0
 0.0    0.0    0.001

Π: (Y(yes):included; N(no):excluded)

["y1", "y2", "y3"]         probability
["Y", "Y", "Y"]                 1.0
["N", "N", "N"]                 0.0
["N", "N", "Y"]                 0.0
["Y", "Y", "N"]                 0.0
["N", "Y", "Y"]                 0.0
["Y", "N", "N"]                 0.0
["Y", "N", "Y"]                 0.0
["N", "Y", "N"]                 0.0

estimatePi                                     true
estimateScale                                 false

Degree of freedom for hyper-parameters:

residual variances:                           7.000
random effect variances:                      6.000
polygenic effect variances:                   8.000
marker effect variances:                      4.000


The file results1/MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance.
The file results1/MCMC_samples_polygenic_effects_variance.txt is created to save MCMC samples for polygenic_effects_variance.
The file results1/MCMC_samples_marker_effects_genotypes_y1.txt is created to save MCMC samples for marker_effects_genotypes_y1.
The file results1/MCMC_samples_marker_effects_genotypes_y2.txt is created to save MCMC samples for marker_effects_genotypes_y2.
The file results1/MCMC_samples_marker_effects_genotypes_y3.txt is created to save MCMC samples for marker_effects_genotypes_y3.
The file results1/MCMC_samples_marker_effects_variances_genotypes.txt is created to save MCMC samples for marker_effects_variances_genotypes.
The file results1/MCMC_samples_pi_genotypes.txt is created to save MCMC samples for pi_genotypes.
The file results1/MCMC_samples_y1.x2_y2.x2_variances.txt is created to save MCMC samples for y1:x2_y2:x2_variances.
The file results1/MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt is created to save MCMC samples for y1:ID_y2:ID_y3:ID_y1:dam_variances.
The file results1/MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1.
The file results1/MCMC_samples_EBV_y2.txt is created to save MCMC samples for EBV_y2.
The file results1/MCMC_samples_EBV_y3.txt is created to save MCMC samples for EBV_y3.
The file results1/MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance.
The file results1/MCMC_samples_heritability.txt is created to save MCMC samples for heritability.

running MCMC ...100%|███████████████████████████████████| Time: 0:00:02


The version of Julia and Platform in use:

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.


Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).
Compute the model frequency for each marker (the probability the marker is included in the model).


# GWAS of Direct Marker Effects on Trait y2

#Compute the model frequency for each marker (the probability the marker is included in the model).
marker_effects_file="results1/MCMC_samples_marker_effects_genotypes_y2.txt"
out = GWAS(marker_effects_file,header=true)

Compute the model frequency for each marker (the probability the marker is included in the model).


# GWAS of Indirect Marker Effects on Trait y2

#Compute the model frequency for each marker (the probability the marker is included in the model).
marker_effects_file="results1/MCMC_samples_indirect_marker_effects_genotypes_y2.txt"
out=GWAS(marker_effects_file,header=true)

Compute the model frequency for each marker (the probability the marker is included in the model).

	ID	y1	y2	y3	x1	x2	x3	dam	bv1
	String	Float64	Float64	Float64	Float64	String	String	String?	Float64
1	a1	-0.579682	-0.0582089	1.0456	0.77	g1	m	missing	-1.07018
2	a2	-2.02246	-2.3843	-1.57196	-1.02	g2	m	missing	-1.03328
3	a3	-1.48071	-0.421258	-0.272245	0.52	g1	f	missing	-1.2916
4	a4	-3.03031	-2.0634	-0.604634	-1.05	g4	m	missing	-1.39308
5	a5	2.18819	2.22425	1.72306	2.06	g3	m	missing	0.267546

	Covariance	Estimate	SD
	Any	Any	Any
1	y1	0.64841	0.141073

	Covariance	Estimate	SD
	Any	Any	Any
1	y1_y1	0.414934	0.283966

	Trait	Marker_ID	Estimate	SD	Model_Frequency
	Any	Any	Any	Any	Any
1	y1	m1	-0.00406394	0.0364831	0.8
2	y1	m2	0.00634028	0.0469896	0.84
3	y1	m3	-0.00825331	0.0436417	0.81
4	y1	m4	-0.00531335	0.042685	0.87
5	y1	m5	0.00325627	0.0459029	0.83
6	y1	m6	-0.00493091	0.0411299	0.83
7	y1	m7	0.0082431	0.0435372	0.81
8	y1	m8	-0.00129083	0.0456803	0.79
9	y1	m9	-0.0103826	0.0409148	0.87
10	y1	m10	0.00437531	0.043969	0.83
11	y1	m11	0.00433119	0.0442428	0.78
12	y1	m12	-0.000937653	0.0402071	0.8
13	y1	m13	-0.00545029	0.0443124	0.84
14	y1	m14	0.00328123	0.0459152	0.9
15	y1	m15	-0.00696694	0.0456781	0.85
16	y1	m16	5.13691e-5	0.0419689	0.83
17	y1	m17	0.0030685	0.0437131	0.83
18	y1	m18	0.00525223	0.0451865	0.83
19	y1	m19	-0.00828877	0.0451934	0.79
20	y1	m20	0.0078435	0.0468086	0.85
21	y1	m21	0.00174429	0.0447379	0.87
22	y1	m22	-0.00731089	0.0419274	0.81
23	y1	m23	-0.01403	0.0423216	0.91
24	y1	m24	0.00364011	0.0490432	0.88
25	y1	m25	0.00594673	0.054351	0.8
26	y1	m26	0.00359238	0.0426903	0.79
27	y1	m27	-0.00175265	0.042688	0.82
28	y1	m28	-0.00165877	0.0419633	0.77
29	y1	m29	-0.00436827	0.0428024	0.8
30	y1	m30	0.00506734	0.0398362	0.76
⋮	⋮	⋮	⋮	⋮	⋮

	ID	y1	y2	y3	x1	x2	x3	dam	bv1
	String	Float64	Float64	Float64	Float64	String	String	String?	Float64
1	a1	-0.579682	-0.0582089	1.0456	0.77	g1	m	missing	-1.07018
2	a2	-2.02246	-2.3843	-1.57196	-1.02	g2	m	missing	-1.03328
3	a3	-1.48071	-0.421258	-0.272245	0.52	g1	f	missing	-1.2916
4	a4	-3.03031	-2.0634	-0.604634	-1.05	g4	m	missing	-1.39308
5	a5	2.18819	2.22425	1.72306	2.06	g3	m	missing	0.267546

	marker_ID	modelfrequency
	Abstrac…	Float64
1	m1	0.91
2	m2	0.94
3	m3	0.88
4	m4	0.96
5	m5	0.94
6	m6	1.0
7	m7	0.98
8	m8	0.96
9	m9	0.8
10	m10	0.96
11	m11	0.95
12	m12	0.99
13	m13	0.97
14	m14	0.89
15	m15	0.96
16	m16	0.81
17	m17	0.99
18	m18	0.96
19	m19	0.96
20	m20	0.98
21	m21	1.0
22	m22	0.96
23	m23	1.0
24	m24	0.93
25	m25	0.96
26	m26	0.94
27	m27	0.85
28	m28	0.96
29	m29	0.98
30	m30	0.96
⋮	⋮	⋮

	marker_ID	modelfrequency
	Abstrac…	Float64
1	m1	0.92
2	m2	0.92
3	m3	0.95
4	m4	1.0
5	m5	0.91
6	m6	0.96
7	m7	0.99
8	m8	0.98
9	m9	0.9
10	m10	0.99
11	m11	0.94
12	m12	0.98
13	m13	0.95
14	m14	0.91
15	m15	0.94
16	m16	0.88
17	m17	0.99
18	m18	0.99
19	m19	0.96
20	m20	0.99
21	m21	0.95
22	m22	0.96
23	m23	1.0
24	m24	0.92
25	m25	0.94
26	m26	0.96
27	m27	0.93
28	m28	0.95
29	m29	0.98
30	m30	0.94
⋮	⋮	⋮