Variant identification software tools | RNA sequencing data analysis
Identifying genomic variation is a crucial step for unraveling the relationship between genotype and phenotype and can yield important insights into human diseases. Prevailing methods rely on cost-intensive whole-genome sequencing (WGS) or whole-exome sequencing (WES) approaches while the identification of genomic variants from often existing RNA sequencing (RNA-seq) data remains a challenge because of the intrinsic complexity in the transcriptome.
Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and able to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Assists users in mapping reads to a reference genome. Subread offers a suite of programs for processing next-generation sequencing read data. This package includes Subread (an aligner), Subjunc (an aligner), Sublong (a long-read aligner), Subindel (a long indel detection program), featureCounts (a read quantification program), exactSNP (an SNP calling program) and other utility programs.
Serves for the functional analysis of gene expression and genomic data. Babelomics offers the possibility to explore the effects of alteration in gene expression levels or changes in genes sequences within a functional context. It provides user-friendly access to a full range of methods that cover: (1) primary data analysis; (2) a variety of tests for different experimental designs; and (3) different enrichment and network analysis algorithms for the interpretation of the results of such tests in the proper functional context.
An accurate read aligner with novel mapping schemes and index tree structure that aims to reduce false positive mappings due to existence of highly similar regions. RASER shows the best mapping accuracy compared to other popular algorithms and highest sensitivity in identifying multiply mapped reads. As a result, RASER displays superb efficacy in unbiased mapping of the alternative alleles of SNPs and in identification of RNA editing sites.
A flexible and easy to use interface that programmers of many levels of experience can use to access information in the popular and common SAM/BAM format. bio-samtools 2 provides new classes for describing genomic regions and genetic variants, allows the easy addition of newly developed SAMtools features and can produce publication-quality visualizations of data with minimal effort by the coder.
An RNA-Seq mapping software tool that include the discovery of transcriptomic and genomic variants like splice junction, chimeric junction, SNVs, Indels in a single analysis step using a built-in error detection method enabling high precision and sensitivity. CRAC is not a pipeline, but a single program that can replace a combination of Bowtie, SAMtools, and TopHat/TopHat-fusion, and can be viewed as an effort to simplify NGS analysis.
Maps mutations generated from forward genetic screens or that spontaneously arise in a population. MMAPPR can identify candidate mutations without any parental strain or genotype information, without previously identified single nucleotide polymorphism (SNP) map databases, and without data from separate individuals. By using only single RNA-seq libraries from a small number of pooled mutant individuals and their phenotypically wild-type siblings, it requires few animals and less sequencing data than required for whole-genome sequence mapping.
Uses RNA-seq data to identify both a region of the genome linked to a mutation as well as candidate mutations that may be causal for the phenotype of interest. RNAmapper can identify mutations that cause nonsense or missense changes to codons, alter transcript splicing, or alter gene expression levels.
Predicts transcriptomic structural variants (TSVs) from RNA-seq data. SQUID is a computational tool that divides the reference genome into segments and builds a genome segment graph from both concordant and discordant RNA-seq read alignments. It can detect both fusion-gene events and TSVs incorporating previously non-transcribed regions into transcripts. Using an integer linear program rearranges the segments of the reference genome so that as many read alignments as possible are concordant with the rearranged sequence.
Finds, manages and investigates genetics variants such as single nucleotide polymorphisms (SNPs). SNiPlay can sort large next generation sequencing (NGS) datasets allowing displaying and studying genome-wide data. It can be useful for population stratification, distance tree interpretation and visualization of SNP density. This tool can recognize genetic loci involved in the control of agronomic traits.
A method for the de novo identification, differential analysis and annotation of variants from RNAseq data in non-model species. TWAS takes as input RNA-seq reads from at least two conditions (e.g. the modalities of the phenotype) with at least two replicates each, and outputs variants associated with the condition. The method does not require any reference genome, nor a database of SNPs. TWAS can therefore be applied to any species for a very reasonable cost.
A versatile variant caller for both DNA- and RNA-sequencing data. VarDict contains many features that are distinct from other variant callers, including linear performance to depth, intrinsic local realignment, built-in capability of de-duplication, detection of polymerase chain reaction (PCR) artifacts, accepting both DNA- and RNA-seq, paired analysis to detect variant frequency shifts alongside somatic and loss of heterozygosity (LOH) variant detection and structural variant (SV) calling. VarDict facilitates application of next-generation sequencing in cancer research, enabling researchers to use one tool in place of an alternative computationally expensive ensemble of tools.
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.
Allows to execute DNA-seq/RNA-seq pipeline. Halvade is a Hadoop MapReduce implementation that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure. The software depends on existing tools, requiring additional data besides the raw sequenced reads, to run the pipeline. It provides functionalities to partition the reference genome in chunks and to copy external dependencies (files or databases) to the worker nodes.
Detects and allows interactive visualization of single-nucleotide polymorphisms (SNPs). QualitySNPng combines SNP detection and genotyping with interactive visualization of the results. This software provides a graphical user interface with pre-set filter options that is configurable for specific needs. It is appropriate to use in marker SNP identification or to analyze RNA-seq data with up to several million reads per transcript to genotype a mixture of a hundred accessions.
A method to detect expressed single nucleotide variants (eSNVs) with high specificity and sensitivity from the high throughput transcriptome sequencing data. Alignments from multiple aligners are used to cover the aligner bias and multiple genomic features are used to improve the specificity. For the expressed SNVs detected, it can also identify the amino acid change and classify the protein domains.
Analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers. RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs).
Identifies single nucleotides polymorphisms (SNPs) in RNA-seq data. SNPiR consists of (1) a modified RNA-seq read-mapping procedure that allows alignment of reads to the reference in a splice-aware manner, (2) variant calling using the Genome Analysis Toolkit (GATK) and (3) vigorous filtering of false-positive calls. The software allows the detection of variants even for lowly expressed genes. It was applied to data from the GM12878 human lymphoblastoid cell line and peripheral blood mononuclear cells (PBMCs) from another healthy individual.
Examines breakpoint predictions, together with their associated structural variation and gene context. BPS is a pipeline for integrating large, complex data sets with a scalable architecture supporting analysis from individual samples on a laptop to very large data sets on compute clusters. Its rendering engine is coupled with a flexible pipeline to detect structural variants, and it accommodates a variety of toolsets and analyses of whole genome sequencing (WGS) and RNA sequencing (RNA-Seq) data.