Marker genotype imputation

Gota Morota
http://morotalab.org/

2/12/2020

Background
Simulating data
Imputation based on the mean of marker codes
Imputation based on the binomial distribution

Background

This exercise illustrates how to impute markers.

Simulating data

rm(list=ls()) # delete the objects in the current environment 

set.seed(777)
W <- matrix(sample(c(0, 1, 2, NA), 100, replace = TRUE, prob=c(0.4, 0.3, 0.2, 0.1)), nrow= 10, ncol=10) # 10 x 10 marker matrix
W

Imputation based on the mean of marker codes

colMeans(W, na.rm = TRUE)

Imputation based on the binomial distribution

p <- colSums(W, na.rm = TRUE) / (2 * nrow(W))
p

# Marker 1
set.seed(777)
rbinom(2, 1, prob = p[1])
set.seed(777)
sum(rbinom(2, 1, prob = p[1]))

# Marker 2
set.seed(777)
rbinom(2, 1, prob = p[2])
set.seed(777)
sum(rbinom(2, 1, prob = p[2]))

# Marker 10
set.seed(777)
rbinom(2, 1, prob = p[10])
set.seed(777)
sum(rbinom(2, 1, prob = p[10]))