Introduction
The RefCov software suite was written as a toolkit to provide multiple
methods for analyzing coverage of sequence data across a reference. As
such, it does not answer a single question, but rather provides the
ability to formulate and answer multiple analytical questions. In a
typical use case, RefCov is provided a SAM/BAM file containing reads
aligned to a genomic reference and a BED file defining genomic regions
of interest within which RefCov will calculate a variety of sequence
coverage (breadth and depth) metrics. Methods and functionality are
provided for everything from general coverage statistics to
per-base-position information as determined by sequence read
alignments to a reference backbone. This software was written with
“real world” analysis of large, highly parallel sequencing platform
coverage for multiple genomes in mind. As such, consideration was
given to scalability and processing approaches for massive input
data. The toolkit does not provide an alignment algorithm, but rather
is a post-processing application for externally called alignments
(e. g., by BWA).
The software requires very little in terms of input and will run
minimally based on an alignment SAM/BAM file and a region of interest
(ROI) file in proper BED format. This suite was written in Perl and is
a constituent part of our larger Genome Modeling System. The SAM/BAM
file format is interrogated using BioPerl's
Bio-SamTools module and subsequent coverage calculations are done
efficiently using PDL. The software
will transparently evaluate which reads should be used in whole or in
part based on a given ROI's start and stop boundaries. Once the
software creates a reference coverage object, any number of methods
may be utilized whose functions range from coverage topography metrics
to assisting in direct comparison of two experimental conditions or
disparate technologies.
How to Cite RefCov
Please note the version number, and use the following URL to cite RefCov:
Wylie T, Walker J, & Mardis ER
URL: http://gmt.genome.wustl.edu/gmt-refcov
Latest RefCov News:
RefCov v0.3 provides critical fixes and several new features. Fixes include restoring several modules that were absent in the previous release. New features include: 1) cluster-coverage for detecting contiguous clusters of sequence reads across a reference, 2) the ability to evaluate coverage of entire chromosomes using the BAM file header as the region-of-interest, ex. --roi-file-path=$BAM --roi-file-format=bam, 3) normalization of coverage using a defined Perl-compatible equation, 4) relative coverage based on a defined list of size bins, and 5) optional output of the chromosome start and end as BED-style columns.
More Information