A universal framework that processes big immunome data from raw sequences to quantitated clonotypes. MiXCR efficiently handles paired- and single-end reads, considers sequence quality, corrects PCR errors and identifies germline hypermutations. The software supports both partial- and full-length profiling and employs all available RNA or DNA information, including sequences upstream of V and downstream of J gene segments.
Allows users to analyze T-cell antigen receptor (TCR) sequencing data. MiTCR is a program permitting the study of hundreds of millions of raw high-throughput sequencing reads containing sequences encoding human or mouse a or TCR chains. It also allows the extraction of -cell clones from next generation sequencing (NGS) data.
A computational method to reconstruct full-length, paired T cell receptor (TCR) sequences from T lymphocyte single-cell RNA sequence data. TraCeR links T cell specificity with functional response by revealing clonal relationships between cells alongside their transcriptional profiles. TraCeR extracts TCR-derived sequencing reads for each cell by alignment against ‘combinatorial recombinomes’ comprising all possible combinations of V and J segments. Reads are then assembled into contiguous sequences that are analyzed to find full-length, recombined TCR sequences. Importantly, the reconstructed recombinant sequences typically contain nearly the complete length of the TCR V(D)J region and so allow high-confidence discrimination between closely related gene segments. Our method is sensitive, accurate and easy to adapt to any species for which annotated TCR gene sequences are available.
A computational framework for Bayesian estimation of antigen-driven selection in immunoglobulin sequences based on the analysis of somatic mutation patterns. BASELINe represents a fundamental advance over previous methods by shifting the problem from one of simply detecting selection to one of quantifying selection. Along with providing a more intuitive means to assess and visualize selection, BASELINe allows comparative analysis between groups of sequences derived from different germline V(D)J segments.
Provides an Hidden Markov Model (HMM)-based framework for studying B-cell receptor sequence (BCRs). Partis is an open source software able to annotate, simulate, and infer clonal family of BCRs. The program deduces parameters about the rearrangement process before performing annotation inference on each sequence in the set. It intends to be effective for analyzing modern large sequencing data sets.
Allows users to analyze T-cell receptor (TCR) repertoire sequences produced by deep sequencing. Decombinator is a program that employs a string matching algorithm to search the FASTQ files produced by high-throughput sequencing (HTS) machines for rearranged TCR sequence. This pipeline contains a central algorithm searches for 'tag' sequences, the presence of which indicates the inclusion of particular V or J genes in a recombination.
Identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species.
Allows users to reconstruct the native T-cell receptors (TCR)αβ from single cell RNA-seq data of Ag-specific T cells and to link these with the gene expression profile of individual cells. VDJPuzzle enables analysis about TCR diversity and its relationship with the transcriptional profile of different clones. Moreover, single-cell transcriptome analysis can successfully distinguish Ag-specific T cell populations sorted directly from resting memory cells in peripheral blood and sorted after ex vivo stimulation. Moreover, it has been adapted for B-cell receptor (BCRs) and includes additional features to reliably characterizes somatic hypermutation (SHMs).
A software package that significantly improves the completeness and accuracy of TCR/IG profiling from deep sequence data and includes procedures to identify novel alleles of gene segments. The alignment step in LymAnalyzer, which is based on a fast-tag-searching algorithm, results in rapid identification of VDJ gene segments, with significantly improved accuracy and completeness compared to existing tools applied to TCR data. In addition, LymAnalyzer can be applied to IG sequences, includes an integrated single nucleotide polymorphism (SNP) calling algorithm that identifies novel alleles of the VDJ gene segments and produces lineage mutation trees to represent the affinity maturation process of the IGs. On real and simulated data sets LymAnalyzer produces highly accurate and complete results. Although, to date we have applied it to TCR/IG data from human and mouse, it can be applied to data from any species for which an appropriate database of reference genes is available.
Analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
Handles alignments with insertions/deletions (indels), while maintaining or improving the speed. HTJOINSOLVER is a desktop application, provided as an implementation of the partitioning method. Similar to the original JOINSOLVER algorithm, it conserves motifs to initiate the alignment process. The original algorithm was developed as a fast and accurate method, but was not designed to handle indels.
Assists users in automating analysis of the generated next-generation-sequencing-spectratyping (NGS-S) data. TCRProfiler is a software that allows users (i) to determine, for each T cell receptor beta locus (TRB) chain analyzed, the exact length and sequence of the CDR3, (ii) to identify the rearranged TRBV (variable), TRBD (diversity) and TRBJ (joining) genes, (iii) to analyze nontemplate nucleotides added at the junction sites, and (iv) to exclude non-functional transcripts.
Analyzes vast amounts of IGHV (immunoglobulin heavy-chain variable) sequences and exploring the resulting data. ImmuneDB can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. After analysis, resulting data can then be easily visualized, queried, and exported through its web-based interface.
Provides a suite of utilities that cover a range of complex analysis tasks for immunoglobulin (Ig) repertoire sequencing data. Change-O is a suite of utilities that (i) processes the output of V(D)J alignment tools, (ii) assigns clonal clusters to Ig sequences and (iii) reconstructs germline sequences. It also offers applications to import data from the frequently used IMGT/HighV-QUEST tool and a set of utilities to perform basic database operations, such as sorting, filtering and modifying annotations.
A method to derive clonotype repertoires from next generation sequencing data with sophisticated routines for handling errors stemming from PCR and sequencing artefacts. The application can handle different kinds of input data originating from single- or paired-end sequencing in different configurations and is generic regarding the species and gene of interest.
Performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. IgRepertoireConstructor takes Illumina MiSeq paired end reads as an input and expect that each left and right reads overlap and cover variable region of antibody. IgRepertoireConstructor stitches paired-end reads and removes contaminations using alignment of stitched reads against Ig germline database. This procedure significantly improves quality of input reads and allows one to constructs reads covering variable region of antibody.
A dynamic programming approach to learn the distribution of rearrangement scenarios from large numbers of non-productive sequences in an efficient way. This approach is based on a Hidden Markov Models (HMM) formulation of the problem, and learns its parameters using a modified BaumWelch (BW) algorithm to avoid the full enumerations of all scenarios. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be ≈ 1023 for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires.
A pipeline that accurately recovers T cell receptor (TCR) sequences, including rare TCR sequences, from high throughput sequencing data (including barcoded data) even at low coverage. RTCR employs a data-driven statistical model to rectify PCR and sequencing errors in an adaptive manner. Using simulations we demonstrate that RTCR can easily adapt to the error profiles of different types of sequencers and exhibits consistently high recall and high precision even at low coverages where other pipelines perform poorly. Using published real data we show that RTCR accurately resolves sequencing errors and outperforms all other pipelines.
Simulating antibody repertoires is often the only feasible way to benchmark new immunoinformatics tools. IgSimulator tool addresses various complications in generating realistic antibody repertoires. IgSimulator’s code has modular structure and can be easily adapted to new requirements to simulation.
A web-based, interactive application for the analysis of immunoglobulins and T cell receptors using next-generation sequencing data. ARResT/Interrogate can organize and filter large amounts of immunogenetic data by various criteria. It calculates several relevant statistics, and presents results in the form of multiple interconnected visualizations. ARResT/Interrogate is implemented primarily in R, and is freely available.
Enables the simulation of time-resolved antibody repertoires. AbSim is an R package that allows user to comprehensively control the vast biologically-relevant parameter space of antibody repertoire selection and evolution. The control of the following parameters is possible by: (i) total time of evolution, (ii) rate and method of somatic hypermutation (SHM), (iii) number and rate of recombination events, (iv) rate at which new sequences are produced, (v) baseline mutation rate, (vi) clonal frequency, and (vii) germline gene usage distribution.
Recognizes B cell clones from adaptive immune receptor repertoire sequencing (AIRR-Seq) datasets in an unsupervised manner. SCOPe conducts spectral clustering of the B cell receptor (BCR) junction region within groups of BCR sequences. It estimates the local sequence neighborhood. This tool permits the improvement of the sensitivity and specificity over a simple fixed threshold for all junction lengths.
Provides immunology researchers with powerful data analysis infrastructure. VDJserver offers tools for performing immune repertoire analysis with a focus on collaboration, streamlined metadata capture, and reproducibility. It also allow to get basic repertoire characterization results for both B cell receptor and T cell receptor repertoires.
Annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR is capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. It also generates figures, such as identity–divergence plots and longitudinal phylogenetic “birthday” trees, and provides interfaces to other programs such as DNAML and BEAST. This software provides a tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages.
Involves data processing, clustering, assembly, and optimization. IMPre is a method that provides a comprehensive approach for identification of novel B- and T-cell receptor (BCR/TCR) genes and alleles in certain species with greatly improved speed, cost, and accuracy. This de novo package comprises four main steps: data processing, clustering, assembly, and optimization. IMPre is stable with animal and long-sequence data.
A stand-alone pipeline primarily based in R programming for the integral analysis of B cell repertoire data generated by HTS. The pipeline integrates GNU software and in house scripts to perform quality filtering, sequencing noise correction and repertoire reconstruction based on V, D and J segment assignment, clonal origin and unique heavy chain identification. Post-analysis scripts generate a wealth of repertoire metrics that in conjunction with a rich graphical output facilitates sample comparison and repertoire mining. The performance of ImmunediveRsity was tested with raw and curated human and mouse 454-Roche sequencing benchmarks providing good approximations of repertoire structure. Although ImmunediveRsity is similar to other recently developed tools, it offers significant advantages that facilitate repertoire analysis and repertoire mining. ImmunediveRsity is open source and free for academic purposes and it runs on 64 bit GNU/Linux and MacOS.
A complementary software suite that solves a wide range of RepSeq post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. The main aims of the VDJtools Project are: (i) Ensure consistency between post-analysis methods and results, (ii) Save the time of bioinformaticians analyzing RepSeq data, (iii) Create an API framework facilitating development of new RepSeq analysis applications, and (iv) Provide a simple enough command line tool so it could be used by immunologists and biologists with little computational background.
Utilizes re-alignment to identify V(D)J genes and alleles after common local alignment. A methodology is developed to correct the PCR and sequencing errors, and to minimize the PCR bias among various rearranged sequences with different V and J gene families. IMonitor provides general adaptation for sequences from all receptor chains of different species and outputs useful statistics and visualizations. Usefulness of IMonitor was demonstrated on minimal residual disease detection of patients with B-cell Acute Lymphoblastic leukemia.
Implements algorithms for obtaining T cell receptors (TCR) sequence pairs. ALPHABETR determines CDR3A/CDR3B pairs from high-throughput sequencing data from repeated samples of antigen-specific T cell populations. With ALPHABETR, users can determine CDR3A/CDR3B pairs, dual TCR-alpha clones and clones that share CDR3A or CDR3B sequences, and estimate clonal frequencies.
A collection of tools for downstream analysis of Rep-Seq data, including clustering and phylogenetic analysis. TRigS assists with the determination and analysis of B-cell lineage trees from next-generation sequencing data. TRigS consists in various tools, including 1) AnnotateTree creates annotated lineage trees and sequence alignments showing the point at which amino acid substitutions occur 2) RevertToGermline uses a simple approach to infer the germline ancestor of a B-cell variable region sequence, given the IMGT junction analysis. 3) Clustering tools support the clustering of sequences for clonal analysis, and their large-scale depiction. 4) Tools for junction parsing and results manipluation support the integration of IgBLAST into an IMGT-style pipeline, and ease the processing of tab-separated analysis files.
A pipeline to identify clonal lymphocyte populations from paired-end RNA sequencing reads, with a primary focus on sequencing mRNA from neoplastic cells. VDJSeq-Solver detects the main clone for the tissue of interest by capturing the most abundant V(D)J junction among the existing ones. It is able to characterize the V(D)J rearrangement by obtaining the exact sequence of the main clone. VDJSeq-Solver overcomes limitations of currently available lymphocyte rearrangements recognition methods that are designed to work with amplicon sequencing data and perform badly in case of irregular junction coverage. It was shown to correctly identify the main clone in case of five Mantle Cell Lymphoma samples and was further successfully tested on twelve Diffuse Large B-Cell Lymphoma samples.
A platform designed for TCR repertoire data analysis in R after preprocessing data with MiTCR or other CDR3 extraction and gene alignment software tools. With the power and flexibility of R language and procedures supported by tcR users can perform advanced statistical analysis of TCR repertoires.
A convenient web based application for analyzing next-generation sequencing results and reporting IGH gene rearrangements for both repertoire and clonality studies. IGGalaxy has two analysis options one using the built in igBLAST algorithm and the second using output from IMGT; in either case repertoire summaries for the B-cell populations tested are available. IGGalaxy supports multi-sample and multi-replicate input analysis for both igBLAST and IMGT/HIGHV-QUEST. IGGalaxy is built on top of the Galaxy framework (http://galaxyproject.org/) and is distributed as a virtual machine image.
A user-friendly tool for analyzing and visualizing IG and TR repertoires based on NGS data. IMEX offers several algorithms for analyzing the clonality and diversity on multiple levels such as V-(D)-J arrangement, CDR, and nucleotide sequences of the whole reads. Moreover, it also provides features for analyzing primer efficiency. IMEX includes various visualization possibilities such as pie charts, histograms, line charts, bubble charts, and heat maps.
A flexible toolkit for the processing and analysis of antigen receptor repertoire sequencing data at single-cell level. The software combines bioinformatics tools for immunoglobulin sequence annotation with a relational database, where raw data and analysis results are stored and linked. sciReptor supports attribution of additional data categories such as cell surface marker expression or immunological metadata. Furthermore, it comprises a quality control module as well as basic repertoire visualization tools.
A standardized file format for representing V(D)J analysis results. VDJML facilitates downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library called VDJMLpy. It's a module for working with the results of immune receptor sequence alignment in VDJML format.
Allows application of standard machine learning techniques to mine full set of repertoire sequences for sequence patterns. MaxSnippetModel scores every snippet by its biochemical features with a detector function. It also aggregates the scores into a single value that can represent a diagnosis. The model is fitted to the training data using gradient based optimization techniques.
Profiles immune repertoire of high-throughput T-cell receptors (TR) and B-cell receptors (BcR) sequencing data. IRProfiler provides core quantification and comparison functions on annotated TR beta chain or BcR IG heavy chain high-throughput sequencing (HTS) data. It employs multiple clonotype definitions with different levels of detail to make flexible analysis. This tool is useful for immunogeneticists and immunoinformaticians.
Aligns non-regular T-cell receptor and immunoglobulin sequences. TRIg does alignments to the whole receptor gene instead of only to the coding regions. This tool applies a heuristic strategy and incorporates gene annotation to identify authentic alignments. TRIg correctly identified non-regularly recombined sequences, which could not be achieved by current programs. TRIg should benefit researches of immune system and improve the prediction of various immune diseases.
Estimates the repertoire differences. RDI is a computational approach allowing comparison between different types of repertoires. It is useful for all analysis of immune sequences and it can be extended to realize any repertoire experiment. This method assists understanding for the variance-inflating effects of low sequencing depth.
Contributes self-developed plots for the descriptive comparative investigation of immune repertoires. ClonoPlot allows the visualization of observations in comparative B cell receptors (BCR) and T cell receptors (TCR) repertoire analysis. It removes any requirement for users to become familiar with the programming language R in order to create high-quality figures. The tool can be adapted by user to the latest developments in the field or incorporate additional features based on their existing in-house expertise.
Processes raw immune sequence reads from any source and learns unbiased statistics of recombination and somatic hypermutations. IGoR is a flexible computational method that outputs a whole list of potential recombination and hypermutation scenarios, with their corresponding likelihoods. It learns a context-dependent hypermutation model to identify hotspots, which allows for a comprehensive analysis of the mutational landscape of B cells receptor (BCR).
Characterizes the full distribution of the T cell receptor (TCR) repertoire. powerTCR offers an overview of the state of the immune repertoire. It simulates large clones that are above the threshold, where the power law begins, using the generalized Pareto distribution (GPD). This tool also models the small clones below the threshold using a truncated Gamma distribution. It is useful to discover descriptors of strengths and deficiencies in the immune system.
Assists users in analyzing T-cell receptor (TCR) repertoire utilizing a sequence analysis approach inspired by phylogenetics. ImmunoMap is a program able to display and quantify immune repertoire diversity and enables assessment of similarity between TCR sequences. Moreover, it can supply insight into the biological impact of tumors on T-cell responses and TCR usage by studying TCR repertoire changes in the presence of tumor.