RefCov v0.3 provides critical fixes and several new features. Fixes include restoring several modules that were absent in the previous release. New features include: 1) cluster-coverage for detecting contiguous clusters of sequence reads across a reference, 2) the ability to evaluate coverage of entire chromosomes using the BAM file header as the region-of-interest, ex. --roi-file-path=$BAM --roi-file-format=bam, 3) normalization of coverage using a defined Perl-compatible equation, 4) relative coverage based on a defined list of size bins, and 5) optional output of the chromosome start and end as BED-style columns.
The RefCov software suite was written as a toolkit to provide multiple methods for analyzing coverage of sequence data across a reference. As such, it does not answer a single question, but rather provides the ability to formulate and answer multiple analytical questions. In a typical use case, RefCov is provided a SAM/BAM file containing reads aligned to a genomic reference and a BED file defining genomic regions of interest within which RefCov will calculate a variety of sequence coverage (breadth and depth) metrics. Methods and functionality are provided for everything from general coverage statistics to per-base-position information as determined by sequence read alignments to a reference backbone. This software was written with “real world” analysis of large, highly parallel sequencing platform coverage for multiple genomes in mind. As such, consideration was given to scalability and processing approaches for massive input data. The toolkit does not provide an alignment algorithm, but rather is a post-processing application for externally called alignments (e. g., by BWA). The software requires very little in terms of input and will run minimally based on an alignment SAM/BAM file and a region of interest (ROI) file in proper BED format. This suite was written in Perl and is a constituent part of our larger Genome Modeling System. The SAM/BAM file format is interrogated using BioPerl's Bio-SamTools module and subsequent coverage calculations are done efficiently using PDL. The software will transparently evaluate which reads should be used in whole or in part based on a given ROI's start and stop boundaries. Once the software creates a reference coverage object, any number of methods may be utilized whose functions range from coverage topography metrics to assisting in direct comparison of two experimental conditions or disparate technologies.
How to Cite RefCov
Please note the version number, and use the following URL to cite RefCov:
Wylie T, Walker J, & Mardis ER