CRAN Task View: Statistical Genetics
| Maintainer: | Giovanni Montana and Neil Shephard |
| Contact: | g.montana at imperial.ac.uk |
| Version: | 2008-04-26 |
Great advances have been made in the field of genetic analysis over the last years. The availability of millions
of single nucleotide polymorphisms (SNPs) in widely available databases, coupled with major advances in SNP genotyping
technology that reduce costs and increase throughput, are enabling a host of studies aimed at elucidating the genetic basis
of complex disease. The focus in this task view is on R packages implementing statistical methods and algorithms for the
analysis of genetic data and for related population genetics studies.
A number of R packages are already available and many more are most likely to be developed in the near future.
Please send your comments and suggestions to either of the task view maintainers.
-
Population Genetics
:
genetics
implements classes and methods for representing genotype and haplotype data, and has several
functions for population genetic analysis (e.g. functions for estimation and testing of Hardy-Weinberg and linkage
disequilibria, etc.).
Geneland
has functions for detecting spatial structures from genetic data
within a Bayesian framework via MCMC estimation.
rmetasim
provides an interface to the metasim engine
for population genetics simulations.
hapsim
simulates haplotype data with pre-specified
allele frequencies and LD patterns. A few population genetics functions are also implemeted in
gap
and
popgen.
popgen
has functions for clustering SNP genotype data and SNP simulation from a Multinomial-Dirichlet model.
hierfstat
allows the estimation of hierarchical F-statistics from haploid or diploid genetic data.
LDheatmap
creates a heat map plot of measures of pairwise LD.
mapLD
measures linkage disequilibrium and constructs haplotype blocks.
hwde
fits models for genotypic disequilibria. Whilst
HardyWeinberg
provides graphical representation of disequilibria via ternary plots (also known as de Finetti diagrams).
Biodem
package provides functions for Biodemographical analysis, e.g.
Fst()
calculates the Fst from the conditional
kinship matrix. Package
kinship
offers some functions for analysis on pedigrees. The
adegenet
implements a number of different methods for analysing population structure using multivariate statistics, graphics and spatial statistics.
-
Phylogenetics
:
Phylogenetic and evolution analyses can be performed via
ape
and
apTreeshape. Package
ouch
provides
Ornstein-Uhlenbeck models for phylogenetic comparative hypotheses.
PHYLOGR
is a suite of functions for the analysis of phylogenetically simulated data sets and
phylogenetically-based GLS model fitting.
stepwise
implements a method for stepwise detection of recombination breakpoints in
sequence alignments.
-
Linkage and Association
:
Packages in this category provide statistical methods to test associations between genetic markers
and a phenotype.
gap
is a package for genetic data analysis of both population and family data;
it contains functions for sample size calculations, probability of familial disease aggregation, kinship calculation,
and some tests for linkage and association analyses. Among the other functions,
genecounting()
estimates haplotype frequencies from genotype data, and
gcontrol()
implements a Bayesian genomic control statistics for association studies. For family data,
tdthap
offers an implementation of the Transmission/Disequilibrium Test (TDT) for extended marker haplotypes, whereas
powerpkg
performs power analyses for the affected sib pair and the TDT design. The
catmap
can be used for the meta-analysis of case-contorl and TDT data.
-
Linkage Disequilibrium and haplotype mapping
:
The package
hapassoc
performs likelihood inference of trait associations with haplotypes in GLMs, and
haplo.ccs
estimates haplotype and covariate relative risks in case-control data by weighted logistic regression.
haplo.stats
also contains functions for the analysis of indirectly measured haplotypes. The statistical methods assume
that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers).
tdthap
implements transmission/disequilibrium tests for extended marker haplotypes.
ldDesign
is a package for design of experiments for association studies for detection of linkage disequilibrium.
LDheatmap
creates a heat map plot of measures of pairwise LD.
mapLD
measures linkage disequilibrium and constructs haplotype blocks.
-
Genome-Wide Association
:
With recent technical advances in high-throughput genotyping technologies the possibility of performing
Genome-Wide Association (GWA) analyses is now a feasible strategy. A number of packages are available to facilitate
the analysis of these large data sets.
GenAbel
is designed for the efficent storage and handling of GWA data with fast analysis tools for quality control,
association with binary and quantitative traits, as well as tools for visualizing results.
pbatR
provides a GUI to the powerful PBAT software which performs family and population based family and
population based studies. The software has been implemented to take advantage of parallel processing, which vastly
reduces the computational time required for GWA's.
SNPassoc
is another package for carrying out GWA analysis. It provides descriptive statistics of the data
(inlcuding patterns of missing data) and tests for Hardy-Weinberg equilibrium. Single-point analyses with binary or
quantitative traits are implemented via generalized linear models, and multiple SNPs can be anlaysed for haplotypic
associations or epistasis.
-
QTL mapping
:
Packages in this category develop methods for the analysis of experimental crosses
to identify markers contributing to variation in quantitative traits.
bqtl
implement both likelihood-based and Bayesian methods for inbred crosses and recombinant inbred
lines.
qtl
provides several functions and a data structure for QTL mapping, including
a function
scanone()
for genome-wide scans. The package
qtlDesign
has functions for designing QTL experiments, including power computations.
qtlbim
implements Bayesian Interval Mapping for QTL.
-
Multiple testing
:
The package
qvalue
implements False Discovery Rate; the main function
qvalue()
estimates the q-values from a list of p-values.
Package
multtest
also offers several non-parametric bootstrap and permutation resampling-based multiple testing procedures.
-
Importing Sequence Data
:
There are utilities in the
seqinr
package to import sequence data from various sources, including files of aligned sequences in mase, clustal, phylip, fasta and msf format which will be of utility to some population genetic analysis. Users interested in using R for sequence data and bioinformatics are also referred to the
Bioconductor
project.
CRAN packages:
Related links: