The decreasing cost of sequencing has moved the focus of cancer genomics beyond single genome studies to the analysis of tens or hundreds of patients diagnosed with similar cancers. Besides the routine discovery and validation of SNVs, indels, and SVs in individual genomes, it is now paramount to systematically analyze the function and recurrence of mutations across a cohort, and to describe how they interact with one other or associate to clinical data. To this end we have developed the Mutational Significance In Cancer (MuSiC) suite of tools. It consists of downstream analysis tools that can:
- Apply statistical methods to identify significantly mutated genes
- Highlight significantly altered pathways
- Investigate the proximity of amino acid mutations in the same gene
- Search for gene-based or site-based correlations to mutations and relationships between mutations themselves
- Correlate mutations to clinical features, using typical correlation measures, and generalized linear models
- Cross-reference findings with relevant databases such as Pfam, COSMIC, and OMIM
- Generate typical visualizations like Kaplan-Meier survival estimates, and mutation status matrices
In an attempt to remain versatile and powerful, MuSiC incorporates a command-line interface with minimal inputs, described as follows:
- Coverage: Mapped reads from a group of tumor/normal sample pairs in BAM format. UCSC WIG files describing coverage may be used if BAMs are unavailable.
- Variants: The predicted or validated SNVs and indels from the cohort in TCGA MAF format (VCF support coming soon).
- Regions of Interest: A set of regions the user is interested in studying, typically the boundaries of coding regions, but can be expanded to non-coding RNA, conserved regions, whole genome, chromosome bands, etc.
- Clinical data: Any relevant clinical data segregated as qualitative and quantitative types (to apply appropriate statistical methods).
- Reference sequence: Must correspond to the BAM/WIG files used. HG18, GRCh37, non-human genomes, microbiome, squished transcriptome, etc.
The tools in the suite may be run individually, or may be automated serially. If you decide to parallelize execution, note that a few tools require the outputs of others. Here are the important bits:
- bmr calc-bmr requires the files generated by bmr calc-covg
- smg requires the gene_mrs file generated by bmr calc-bmr
- path-scan requires the gene_covgs directory generated by bmr calc-covg
- mutation-relation is computationally expensive, and impractical with more than 100 genes. So it is a good idea to set its —gene-list to the significantly mutated gene list generated by smg
Updates to MuSiC are made available for distribution and served immediately to standard updating mechanisms on Ubuntu and other Debian systems via our applications server.