Computing A and its inverse / HWE test

class: center, middle, inverse, title-slide

# Computing A and its inverse / HWE test
## ASCI 431-831
### Gota Morota
### 2018/02/01

---

# Comparison of algorithms 
The simulate dataset involves 1,004 animals.

- What is the fastest algorithm to compute an `$\mathbf{A}$` matrix?
   - Tabular method ([Wright 1922](http://www.journals.uchicago.edu/doi/abs/10.1086/279872), [Cruden 1949](https://academic.oup.com/jhered/article-abstract/40/9/248/868992), and [Emik and Terrill 1949](https://academic.oup.com/jhered/article-abstract/40/2/51/802925)) 
   - Recursive method for computing L ([Henderson, 1976](https://www.jstor.org/stable/2529339))
 
- What is the fastest algorithm to compute the inverse of `$\mathbf{A}$` matrix?
   - A simple method using L ([Henderson, 1976](https://www.jstor.org/stable/2529339))
   - Rapid computation of the diagonal of L ([Quaas, 1976](https://www.jstor.org/stable/2529279))
   -  A simple method NOT using L designed for a non-inbred population ([Henderson, 1976](https://www.jstor.org/stable/2529339))
   - Take the direct inverse of `$\mathbf{A}$` using the `solve()` function

Developing a more efficient algorithm for computing `$\mathbf{A}$` or `$\mathbf{A}^{-1}$` is (was?) active research area in animal breeding.

---
# Benchmarking

- Programming languages 
   - C++ vs. R
 
- Functions
   - R function written in C vs. R function written in plain R
    
- Number of times to evaluate the expression

- Order of evaluations

- Operating systems

---
# Measuring running time of R code

- [Sys.time()](https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/Sys.time) built in function

- [system.time()](https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/system.time) built in function

- [rbenchmark](https://cran.r-project.org/web/packages/rbenchmark/index.html) package

- [tictoc](https://cran.r-project.org/web/packages/tictoc/index.html) package

- [microbenchmark](https://cran.r-project.org/web/packages/microbenchmark/index.html) package

Our choice of function is the `microbenchmark` function in the microbenchmark package. Follow the link [here](http://morotalab.org/asci431-2018/day08/day08Amat.html) to see the comparisons of algorithms.

---
# Invert a large matrix?

- Use sovle() to obtain `$\mathbf{A}^{-1}$` from `$\mathbf{A}$`

- Obtain `$\mathbf{A}^{-1}$` with setting up `$\mathbf{A}$`?

- Rounding errors

---
# Source code - synbreed::kin

```r
libary(synbreed)
> kin
function (gpData, ret = c("add", "kin", "dom", "gam", "realized", 
    "realizedAB", "sm", "sm-smin", "gaussian"), DH = NULL, maf = NULL, 
    selfing = NULL, lambda = 1, P = NULL, cores = 1) 
{
    ret <- match.arg(ret, choices = c("add", "kin", "dom", "gam", 
        "realized", "realizedAB", "sm", "sm-smin", "gaussian"), 
        several.ok = FALSE)
    NAs <- FALSE
    if (ret %in% c("realized", "realizedAB", "sm", "sm-smin")) 
        if (any(is.na(gpData$geno))) 
            NAs <- TRUE
```

---
# Source code - kinship2::kinship

```r
libary(kinship2)
> getAnywhere(kinship.pedigree)
It was found in the following places
  registered S3 method for kinship from namespace kinship2
  namespace:kinship2
with value

function (id, chrtype = "autosome", ...) 
{
    chrtype <- match.arg(casefold(chrtype), c("autosome", "x"))
    if (any(duplicated(id$id))) 
        stop("All id values must be unique")
    n <- length(id$id)
    pdepth <- kindepth(id)
    havemz <- FALSE
    if (!is.null(id$relation) && any(id$relation$code == "MZ twin")) {
        havemz <- TRUE
```

---
# What creates departure from the Hardy-Weinberg equilibrium?

- indication of genotyping error

- selection

- migration (admixed populations)

- polymorphisms are inside of copy number variations

---
# Hardy-Weinberg equilibrium test

- Why the degree of freedom is 1 instead of 2?

- estimate allele frequency 
   
--

- derive `$\chi^2$` statistic