R: Plot the proportion of missing genotype information

plot.info {qtl}

R Documentation

Plot the proportion of missing genotype information

Description

Plot a measure of the proportion of missing information in the genotype data.

Usage

plot.info(x, chr, method=c("both","entropy","variance"), step=1,
          off.end=0, error.prob=0.001,
          map.function=c("haldane","kosambi","c-f","morgan"),
          alternate.chrid=FALSE, ...)

Arguments

`x`	An object of class `cross`. See `read.cross` for details.
`chr`	Optional vector indicating the chromosomes to plot. This should be a vector of character strings referring to chromosomes by name; numeric values are converted to strings. Refer to chromosomes with a preceding `-` to have all chromosomes but those considered. A logical (TRUE/FALSE) vector may also be used.
`method`	Indicates whether to plot the entropy version of the information, the variance version, or both.
`step`	Maximum distance (in cM) between positions at which the missing information is calculated, though for `step=0`, it is are calculated only at the marker locations.
`off.end`	Distance (in cM) past the terminal markers on each chromosome to which the genotype probability calculations will be carried.
`error.prob`	Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype \| true genotype).
`map.function`	Indicates whether to use the Haldane, Kosambi or Carter-Falconer map function when converting genetic distances into recombination fractions.
`alternate.chrid`	If TRUE and more than one chromosome is plotted, alternate the placement of chromosome axis labels, so that they may be more easily distinguished.
`...`	Passed to `plot.scanone`.

Details

The entropy version of the missing information: for a single individual at a single genomic position, we measure the missing information as H = sum p[g] log p[g] / log n, where p[g] is the probability of the genotype g, and n is the number of possible genotypes, defining 0 log 0 = 0. This takes values between 0 and 1, assuming the value 1 when the genotypes (given the marker data) are equally likely and 0 when the genotypes are completely determined. We calculate the missing information at a particular position as the average of H across individuals. For an intercross, we don't scale by log n but by the entropy in the case of genotype probabilities (1/4, 1/2, 1/4).

The variance version of the missing information: we calculate the average, across individuals, of the variance of the genotype distribution (conditional on the observed marker data) at a particular locus, and scale by the maximum such variance.

Calculations are done in C (for the sake of speed in the presence of little thought about programming efficiency) and the plot is created by a call to plot.scanone.

Note that summary.scanone may be used to display the maximum missing information on each chromosome.

Value

An object with class scanone: a data.frame with columns the chromosome IDs and cM positions followed by the entropy and/or variance version of the missing information.

Author(s)

Karl W Broman, kbroman@biostat.wisc.edu

Examples

data(hyper)

plot.info(hyper,chr=c(1,4))

# save the results and view maximum missing info on each chr
info <- plot.info(hyper)
summary(info)

[Package qtl version 1.11-12 Index]