SomaticSniper User Manual
There are two modes, the joint genotyping mode (-J) takes into account the fact that the tumor and normal samples are not entirely independent and also takes into account the prior probability of a somatic mutation. This probability can be scaled to control the sensitivity of the algorithm. An accurate value for this prior would be , but this may result in a severe lack of sensitivity at lower depths. A less realistic prior probability will generate more sensitive results at the expense of an increase in the number of false positives. To get a similar sensitivity to the default mode, we recommend using a prior of . The default mode treats the two samples as if they came from two different individuals. This mode uses a less accurate mathematical model, but yields good results, especially if the normal may contain some tumor cells or the tumor is quite impure.
Usage
bam-somaticsniper [options] -f <ref.fasta> <tumor.bam> <normal.bam> <snv_output_file>
Required Option:
| -f | FILE REQUIRED reference sequence in the FASTA format |
Options:
| -q | INT filtering reads with mapping quality less than INT [0] |
| -Q | INT filtering somatic snv output with somatic quality less than INT [15] |
| -p | FLAG disable priors in the somatic calculation. Increases sensitivity for solid tumors. |
| -J | FLAG Use prior probabilities accounting for the somatic mutation rate |
| -s | FLOAT prior probability of a somatic mutation (implies -J) [0.01] |
| -T | FLOAT theta in maq consensus calling model (for -c/-g) [0.850000] |
| -N | INT number of haplotypes in the sample (for -c/-g) [2] |
| -r | FLOAT prior of a difference between two haplotypes (for -c/-g) [0.001000] |
| -F | STRING select output format (vcf or classic) [classic] |
Notes on running SomaticSniper
Minimally, you must provide the program the reference fasta the bams were aligned against (passed with the -f option), a tumor bam, a normal bam, and the filename of the resulting output file. We recommend filtering out reads with a mapping quality of 0 (i.e. use -q 1) as they are typically randomly placed in the genome. We have also found that few variants with a somatic score less than 15 validate, but you may decrease the minimum score or increase it to a higher threshold (eg -Q 40). To obtain high confidence sites, we recommend also thresholding the minimum average mapping quality for the variant base to 40 for reads aligned with BWA or 70 for reads aligned with MAQ. We have not tested other aligners at this time. Disabling priors is not recommended, but may increase sensitivity at the cost of a decrease in specificity.
File Formats
The output by SomaticSniper consists of line for all sites whose consensus differs from the reference base. Each of the three available output formats is described below
Classic:
Each line contains the following tab-separated values:
- Chromosome
- Position
- Reference base
- IUB genotype of tumor
- IUB genotype of normal
- Somatic Score
- Tumor Consensus quality
- Tumor variant allele quality
- Tumor mean mapping quality
- Normal Consensus quality
- Normal variant allele quality
- Normal mean mapping quality
- Depth in tumor (# of reads crossing the position)
- Depth in normal (# of reads crossing the position)
- Mean base quality of reads supporting reference in tumor
- Mean mapping quality of reads supporting reference in tumor
- Depth of reads supporting reference in tumor
- Mean base quality of reads supporting variant(s) in tumor
- Mean mapping quality of reads supporting variant(s) in tumor
- Depth of reads supporting variant(s) in tumor
- Mean base quality of reads supporting reference in normal
- Mean mapping quality of reads supporting reference in normal
- Depth of reads supporting reference in normal
- Mean base quality of reads supporting variant(s) in normal
- Mean mapping quality of reads supporting variant(s) in normal
- Depth of reads supporting variant(s) in normal
VCF
VCF output from SomaticSniper conforms to version 4.1 of the VCF specification. Hence, each non-header output line contains the following fields:
- Chromosome
- Position
- ID (unused)
- Reference base
- Alternate bases (comma separated)
- Quality (unused)
- Filters (unused)
- INFO (unused)
- FORMAT specification for each sample
- NORMAL sample data
- TUMOR sample data
The following FORMAT fields will be populated for each of NORMAL and TUMOR.
| ID | Number | Type | Description |
| GT | 1 | String | Genotype |
| IGT | 1 | String | Genotype when called independently (only filled if called in joint prior mode) |
| DP | 1 | Integer | Total read depth |
| DP4 | 4 | Integer | # high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases |
| BCOUNT | 4 | Integer | Occurrence count for each base at this site (A,C,G,T) |
| GQ | 1 | Integer | Genotype quality |
| JGQ | 1 | Integer | Joint genotype quality (only filled if called in joint prior mode) |
| VAQ | 1 | Integer | Variant quality |
| BQ | . | Integer | Average base quality |
| MQ | . | Integer | Average mapping quality |
| SS | 1 | Integer | Variant status relative to non-adjacent normal: 0=wildtype, 1=germline, 2=somatic, 3=LOH, 4=unknown |
| SSC | 1 | Integer | Somatic Score |
User Support
Please mail genome-dev@genome.wustl.edu with problems or questions.