Pan-genome data analysis software tools | Shotgun metagenomic sequencing
A pan-genome is defined as the set of all unique gene families found in one or more strains of a prokaryotic species. Studies of pan-genomes have become popular due to the easy access to whole-genome sequence data for prokaryotes. A pan-genome study reveals species diversity and gene families that may be of special interest, e.g because of their role in bacterial survival or their ability to discriminate strains. Pan-genome data analysis software tools are used to builds large-scale pan-genomes, identifications of core and accessory genes, de novo protein family prediction, and more.
Performs five analytic functions with only one command, including cluster analysis of functional genes, pan-genome profile analysis, genetic variation analysis of functional genes, species evolution analysis and function enrichment analysis of gene clusters. PGAP's performance has been evaluated on 11 Streptococcus pyogenes strains.
A tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.
Determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. Panseq readily extracts regions unique to a genome or group of genomes, identifies single nucleotide polymorphisms (SNPs) within shared core genomic regions. It constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output.
Allows users to view and examine prokaryotic genomes in a circular or linear context. GView is a research and visualization tool. This software is a complete rewrite of the CGView program that fully supports the CGView XML format, has a public application programming interface (API) that facilitates its incorporation into other Java programs and has a command-line interface that allows it to be run from scripts. The GView Server assists users in performing commonly desired comparative genomics analyses and rendering the corresponding genome maps.
Assists users with pan-genomic analysis of closely related prokaryotic species or strains. PanOCT is a graph-based method that uses both basic local alignment search tool (BLAST), basic score ratio (BSR) and conserved gene neighborhood (CGN) in a weighted scoring scheme. It generates clusters containing single orthologous genes from each of multiple genomes and by detecting and accounting for potential frame-shifts.
Brings together many aspects of today’s cutting-edge genomic, metagenomic, and metatranscriptomic analysis practices to address a wide array of needs. Anvi’o is an advanced analysis and visualization platform that offers automated and human-guided characterization of microbial genomes in metagenomic assemblies, with interactive interfaces that can link ‘omics data from multiple sources into a single, intuitive display. It empowers researchers without extensive bioinformatics skills to perform and communicate in-depth analyses on large ‘omics datasets.
An open-source software package that builds on popular orthology-calling approaches making highly customizable and detailed pangenome analyses of microorganisms accessible to nonbioinformaticians. GET_HOMOLOGUES can cluster homologous gene families using the bidirectional best-hit, COGtriangles, or OrthoMCL clustering algorithms. Clustering stringency can be adjusted by scanning the domain composition of proteins using the HMMER3 package, by imposing desired pairwise alignment coverage cutoffs, or by selecting only syntenic genes.