Contents

1 Bayesian penalized regression

We will use the BGLR R package to fit Bayesian ridge regression and Bayesian LASSO.

1.1 Rice data

We will use the the same rice data from last week.

1.2 Data cleaning

library(BGLR)
out<- read_ped("../../../../../teaching/2016/ASCI896/day16/RiceDiversity_44K_Genotypes_PLINK/sativas413.ped")
p=out$p
n=out$n
out=out$x
#Recode snp to 0,1,2 format using allele 1
# 0 --> 0
# 1 --> 1
# 2 --> NA
# 3 --> 2
out[out==2]=NA
out[out==3]=2
W=matrix(out,nrow=p,ncol=n,byrow=TRUE)
W=t(W)

1.3 Genotype imputation

for (j in 1:ncol(W)){
  W[,j] <- ifelse(is.na(W[,j]), mean(W[,j], na.rm=TRUE), W[,j])
}

1.4 Quality control

# accession ID
fam <-read.table("../../../../../teaching/2016/ASCI896/day16/RiceDiversity_44K_Genotypes_PLINK/sativas413.fam", header = FALSE, stringsAsFactors = FALSE)  
head(fam)
rownames(W) <- fam$V2 # 413 x 36901

# MAF=0.05
p <- colSums(W) / (2*nrow(W))
maf <- ifelse(p > 0.5, 1-p, p)
maf.index <- which(maf < 0.05)
W <- W[, -maf.index]  # 413 x 33701

1.5 Phenotypes

# phenotypes
rice.pheno <- read.table("http://ricediversity.org/data/sets/44kgwas/RiceDiversity_44K_Phenotypes_34traits_PLINK.txt",header=TRUE, stringsAsFactors = FALSE, sep = "\t")
table(rownames(W) == rice.pheno$NSFTVID)
y <- rice.pheno$Flowering.time.at.Arkansas # # use the first trait 
index <- !is.na(y)
y <- y[index] # 374
W <- W[index,] # 374 33701

1.6 Bayesian ridge regression

We will first create a list named ETA that includes a marker matrix and a type of model we want to fit. If you set model = BRR, the BGLR function will run Bayesian ridge regression. In the BGLR function, set how many MCMC and burn-in samples you would like to take by setting the nIter and burnIn arguments.

# rule-based priors
?BGLR
ETA <- list(MRK=list(X = W, model = "BRR"))
fit.BRR <- BGLR(y = y, ETA = ETA, nIter = 500, burnIn = 250, verbose = TRUE)

head(fit.BRR$ETA[[1]]$b) # estimated marker effects 

1.7 Bayesian LASSO

To fit Bayesian LASSO, set model = BL. In the BGLR function, set how many MCMC and burn-in samples you would like to take by setting the nIter and burnIn arguments.

# rule-based priors 
?BGLR
ETA <- list(MRK=list(X = W, model="BL"))
fit.BL <- BGLR(y = y, ETA = ETA, nIter = 500, burnIn = 250, verbose = TRUE)

head(fit.BL$ETA[[1]]$b) # estimated marker effects