Variant aggregation/summarization software tools | Whole-genome sequencing data analysis
There are many tools for variant calling and effect prediction, but little to tie together large sample groups. Aggregating, sorting, and summarizing variants and effects across a cohort is often done with ad hoc scripts that must be re-written for every new project.
Permits users to parse, analyze and manipulate VCF files. VCFtools is a software package for composed of two modules: the first is a general API that allows various operations to be performed on VCF files, including format validation, merging, comparing, intersecting, making complements and basic overall statistics; the second module analyze single-nucleotide polymorphism (SNP) data in VCF format, assisting researchers to estimate allele frequencies, levels of linkage disequilibrium and various quality control (QC) metrics.
Recognizes causative gene candidate using only two alleles of a male-sterile Drosophila locus. SnpSift can interpret weak single nucleotide polymorphisms (SNPs) such as those located in the 5’UTR or promoter regions. This strategy can be used to discover mutations that contain SNPs at regulatory regions of the genes, such as in many examples of population studies. This tool assists users to retrieve causative SNPs in mutants derived from random chemical mutagenesis screens.
Contains mappings between Entrez Gene identifiers and GenBank accession numbers. org.Hs.eg.db is an R object and an organism specific package that provides detailed information about the species abbreviated in the second part of the package name org.Hs.eg.db. This package is updated biannually. Objects in this package are accessed using the proposed interface.
Aims to facilitate the analysis of genome scale data from several standard file formats. CGAT permits users to filter, compare, convert, summarize and annotate genomic intervals, gene sets and sequences. The software comprises more than 50 tagged tools, each with documentation and examples. The tags associate tools with broad themes (genomic intervals, gene sets, sequences), standard genomic file formats and the type of computation performed by the tool, such as statistical summary, format conversion, annotation, comparison or filtering.
Enables summarization, analysis and visualization of mutation annotation format (MAF) files. Maftools is an R package that provides several functions to carry out routinely performed analyses and visualizations in cohort-based cancer studies. The software also allows annotation and format conversions and is able to handle ICGC Simple Somatic Mutation format. It was applied on TCGA acute myeloid leukemia cohort (LAML).
A lightweight tool for performing operations (e.g., intersection, difference, …) on genomic data contained in .vcf and .bed files. Joinx also provides some limited analysis functions (concordance reports). An important assumption that joinx makes is that the input data is always sorted. This allows it to compute its results in an efficient manner.
Gathers base-level metrics across a whole genome from a group of BAMs. qPileup is a part of the AdamaJava, a project that holds code for variant callers and pipeline tools related to next-generation sequencing (NGS). It permits to create a summary of information for each position in a reference sequences. It makes use of the HDF5 data storage format to store the data, allowing for small files sizes via compression and quick access via indexing.