helix_bottom

Introduction

The decreasing cost of sequencing has moved the focus of cancer genomics beyond single genome studies to the analysis of tens or hundreds of patients diagnosed with similar cancers. Besides the routine discovery and validation of SNVs, indels, and SVs in individual genomes, it is now paramount to systematically analyze the function and recurrence of mutations across a cohort, and to describe how they interact with one other or associate to clinical data. To this end we have developed the Mutational Significance In Cancer (MuSiC) suite of tools. It consists of downstream analysis tools that can:

  1. Apply statistical methods to identify significantly mutated genes
  2. Highlight significantly altered pathways
  3. Investigate the proximity of amino acid mutations in the same gene
  4. Search for gene-based or site-based correlations to mutations and relationships between mutations themselves
  5. Correlate mutations to clinical features, using typical correlation measures, and generalized linear models
  6. Cross-reference findings with relevant databases such as Pfam, COSMIC, and OMIM
  7. Generate typical visualizations like Kaplan-Meier survival estimates, and mutation status matrices

In an attempt to remain versatile and powerful, MuSiC incorporates a command-line interface with minimal inputs, described as follows:

  1. Coverage: Mapped reads from a group of tumor/normal sample pairs in BAM format. UCSC WIG files describing coverage may be used if BAMs are unavailable.
  2. Variants: The predicted or validated SNVs and indels from the cohort in TCGA MAF format (VCF support coming soon).
  3. Regions of Interest: A set of regions the user is interested in studying, typically the boundaries of coding regions, but can be expanded to non-coding RNA, conserved regions, whole genome, chromosome bands, etc.
  4. Clinical data: Any relevant clinical data segregated as qualitative and quantitative types (to apply appropriate statistical methods).
  5. Reference sequence: Must correspond to the BAM/WIG files used. HG18, GRCh37, non-human genomes, microbiome, squished transcriptome, etc.

The tools in the suite may be run individually, or may be automated serially. If you decide to parallelize execution, note that a few tools require the outputs of others. Here are the important bits:

  • bmr calc-bmr requires the files generated by bmr calc-covg
  • smg requires the gene_mrs file generated by bmr calc-bmr
  • path-scan requires the gene_covgs directory generated by bmr calc-covg
  • mutation-relation is computationally expensive, and impractical with more than 100 genes. So it is a good idea to set its —gene-list to the significantly mutated gene list generated by smg

Updates to MuSiC are made available for distribution and served immediately to standard updating mechanisms on Ubuntu and other Debian systems via our applications server.

Latest Genome MuSiC News:


Genome MuSiC v0.4 Released

MuSiC v0.4 is now available for download. This release adds new visualization tools, performance improvements, support for TCGA MAF v2.3, and coverage files in UCSC WIG format when BAMs are impractical. Here is a complete changelog:

  • Added tools to generate typical visualizations like Kaplan-Meier survival estimates, and mutation status matrices.
  • Support for TCGA Mutation Annotation Format (MAF) version 2.3.
  • Performance improvements in mutation rate calculations, and more efficient memory usage.
  • Added support for wiggle track format files describing coverage, if BAMs are unavailable.

Genome MuSiC v0.3 Released

MuSiC (Mutational Significance in Cancer) 0.3 is now available for download featuring numerous fixes and several new features. MuSiC performs a variety of statistical analyses on the somatic (and germline) alterations discovered in any cancer cohort. Improvements in this version include an enhanced significantly mutated gene test which introduces the ability to 1) take into account sample-specific mutation rates and 2) identify significantly mutated non-genic regions of the genome. The clinical correlation module now features a generalized linear model option allowing for the elimination of covariate influences on mutation-phenotype relationships. Support for MAF 2.2 and for Pfam annotation of GRCh37 (hg19) are now standard. Additionally, several MuSiC components have been optimized and parallelized for faster execution.

More Information

Genome MuSiC v0.2 Released

The decreasing cost of sequencing has moved the focus of cancer genomics beyond single genome studies to the analysis of tens or hundreds of patients diagnosed with similar cancers. Besides the routine discovery and validation of SNVs, indels, and SVs in individual genomes, it is now paramount to systematically analyze the function and recurrence of mutations across a cohort, and to describe how they interact with one other and with the associated clinical data. To this end we have developed the Mutational Significance In Cancer package (MuSiC). It consists of a suite of downstream analysis tools designed to (1) apply statistical methods to identify significantly mutated genes, (2) highlight significantly altered pathways, (3) investigate the proximity of amino acid mutations in the same gene, (4) search for gene-based or site-based correlations to mutations and relationships between mutations themselves, (5) correlate mutations to clinical features, and (6) cross-reference findings with relevant databases such as Pfam, COSMIC, and OMIM.

More Information