class: center, middle, inverse, title-slide # Computing A and its inverse / HWE test ## ASCI 431-831 ### Gota Morota ### 2018/02/01 --- # Comparison of algorithms The simulate dataset involves 1,004 animals. - What is the fastest algorithm to compute an `\(\mathbf{A}\)` matrix? - Tabular method ([Wright 1922](http://www.journals.uchicago.edu/doi/abs/10.1086/279872), [Cruden 1949](https://academic.oup.com/jhered/article-abstract/40/9/248/868992), and [Emik and Terrill 1949](https://academic.oup.com/jhered/article-abstract/40/2/51/802925)) - Recursive method for computing L ([Henderson, 1976](https://www.jstor.org/stable/2529339)) - What is the fastest algorithm to compute the inverse of `\(\mathbf{A}\)` matrix? - A simple method using L ([Henderson, 1976](https://www.jstor.org/stable/2529339)) - Rapid computation of the diagonal of L ([Quaas, 1976](https://www.jstor.org/stable/2529279)) - A simple method NOT using L designed for a non-inbred population ([Henderson, 1976](https://www.jstor.org/stable/2529339)) - Take the direct inverse of `\(\mathbf{A}\)` using the `solve()` function Developing a more efficient algorithm for computing `\(\mathbf{A}\)` or `\(\mathbf{A}^{-1}\)` is (was?) active research area in animal breeding. --- # Benchmarking - Programming languages - C++ vs. R - Functions - R function written in C vs. R function written in plain R - Number of times to evaluate the expression - Order of evaluations - Operating systems --- # Measuring running time of R code - [Sys.time()](https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/Sys.time) built in function - [system.time()](https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/system.time) built in function - [rbenchmark](https://cran.r-project.org/web/packages/rbenchmark/index.html) package - [tictoc](https://cran.r-project.org/web/packages/tictoc/index.html) package - [microbenchmark](https://cran.r-project.org/web/packages/microbenchmark/index.html) package -- Our choice of function is the `microbenchmark` function in the microbenchmark package. Follow the link [here](http://morotalab.org/asci431-2018/day08/day08Amat.html) to see the comparisons of algorithms. --- # Invert a large matrix? -- - Use sovle() to obtain `\(\mathbf{A}^{-1}\)` from `\(\mathbf{A}\)` -- - Obtain `\(\mathbf{A}^{-1}\)` with setting up `\(\mathbf{A}\)`? -- - Rounding errors --- # Source code - synbreed::kin ```r libary(synbreed) > kin function (gpData, ret = c("add", "kin", "dom", "gam", "realized", "realizedAB", "sm", "sm-smin", "gaussian"), DH = NULL, maf = NULL, selfing = NULL, lambda = 1, P = NULL, cores = 1) { ret <- match.arg(ret, choices = c("add", "kin", "dom", "gam", "realized", "realizedAB", "sm", "sm-smin", "gaussian"), several.ok = FALSE) NAs <- FALSE if (ret %in% c("realized", "realizedAB", "sm", "sm-smin")) if (any(is.na(gpData$geno))) NAs <- TRUE ``` --- # Source code - kinship2::kinship ```r libary(kinship2) > getAnywhere(kinship.pedigree) It was found in the following places registered S3 method for kinship from namespace kinship2 namespace:kinship2 with value function (id, chrtype = "autosome", ...) { chrtype <- match.arg(casefold(chrtype), c("autosome", "x")) if (any(duplicated(id$id))) stop("All id values must be unique") n <- length(id$id) pdepth <- kindepth(id) havemz <- FALSE if (!is.null(id$relation) && any(id$relation$code == "MZ twin")) { havemz <- TRUE ``` --- # What creates departure from the Hardy-Weinberg equilibrium? -- - indication of genotyping error -- - selection -- - migration (admixed populations) -- - polymorphisms are inside of copy number variations --- # Hardy-Weinberg equilibrium test - Why the degree of freedom is 1 instead of 2? -- - estimate allele frequency -- - derive `\(\chi^2\)` statistic