Allows detection and analysis of circular extrachromosomal DNA (ECDNA). ECdetect is an image analysis software package that provides a method for quantifying ECDNA from DAPI-stained metaphases in an unbiased, semi-automated fashion.
Allows user to handle files recorded under the GCTx format, which is especially suited to manage data matrices paired accompanied by metadata annotations. cmap gathers an assortment of open source software, available in four different languages, providing the ability to interact with the features supplied by the Connectivity map resources. These programs aim to ease their integration in existing frameworks.
TProvides a web app designed for the prediction of eukaryotic selenocysteine insertion sequence (SECIS) elements and selenoprotein genes. Selenoprotein prediction server offers 2 modules: (i) SECISearch3, a method based on the Infernal suite (INFERence of RNA ALignment) that has at its core a manually curated alignment of more than a thousand eukaryotic SECIS elements and (ii) Seblastian, a pipeline to predict selenoprotein genes in nucleotide sequences which employs the identification of SECIS elements.
Predicts BRCA1/BRCA2 deficiency in cancer. HRDetect is a whole genome sequencing (WGS)-based predictor for detection of homologous recombination (HR)-deficient tumors. The model was assessed using independent cohorts of breast, ovarian and pancreatic cancers. It was applied to a cohort of 560 breast cancer patients with 22 known germline BRCA1/BRCA2 mutation carriers and identified an additional 22 somatic BRCA1/BRCA2 null tumours and 47 tumours with functional BRCA1/BRCA2-deficiency where no mutation was detected.
Permits ‘genecentric’ annotation of the human genome for laboratory and analytical work carried out at the Core Genotyping Facility (CGF) of the National Cancer Institute. Genewindow integrates data available in the public databases with internal annotations from sequence data generated by our laboratory. It is configured for the human genome and can be applied to other genomes and integrated with the analysis, storage and archiving of data generated in any laboratory setting.
Incorporates information on gene regulation from a set of markers, increases the power to detect associations relative to traditional SNP-based GWAS and known gene-based tests under a broad range of genetic architectures and provides mechanistic insights and more easily interpreted direction of effect into the observed associations. PrediXcan can detect known and novel genes associated with disease traits and provide insights into the mechanism of these associations.
Permits to evaluate pleiotropy in a standard mendelian randomization (MR) model. MR-PRESSO contains three main functions: (1) detection of pleiotropy (MR-PRESSO global test); 2) correction of pleiotropy via outlier removal (MR-PRESSO outlier test); and 3) testing of significant differences in the causal estimates before and after correction for outliers (MR-PRESSO distortion test). The tool is flexible and can be used in several different MR tests.
Characterizes the orientation of individual duplicons for a given primate genomic sequence. DupMasker fixes the fine mosaic substructure for a given complex duplication block. It gathers data about the ancestral origin of a major part of human segmental duplications. This tool is able to construct annotation of the duplication composition of sequenced clones. It starts by screening sequences for all common interspersed repeats.
A mixed-model approach that enables joint analysis across multiple correlated traits while accounting for population structure and relatedness. mtSet effectively combines the benefits of set tests with multi-trait modeling and is computationally efficient, enabling genetic analysis of large cohorts (up to 500,000 individuals) and multiple traits.
Integrates quantitative trait loci (QTL) information with genome wide association studies (GWAS) results to map disease-associated genes. MetaXcan was trained on transcriptome models in 44 human tissues from Genotype-Tissue Expression (GTEx) and was able to estimate their effect on phenotypes from over available genome-wide association meta-analysis (GWAMA) studies. It uses single nucleotide polymorphism (SNP)-level association results.
Allows users to generate sliding window plots of seven different sequence properties. ASEQH is intended for analysis of prokaryotic genomes but it can be applied to eukaryotic chromosomes with some limitations. Percentage of G and C nucleotides in the sliding window is plotted with respect to the position of the center of the window. The G+C percentage is calculated for all genes with their mid-point within the window.
Identifies gene-by-gene and gene-by-environment interactions. GMDR allows users to perform analyses for detection of multifactor interactions with large-scale data. The software implements a set of methods on the analysis of interactions with diverse study designs such as case-control design, family based design or a combination of both. GMDR also provides features such as large-scale data management and preprocessing. It can assist in revealing genetic architecture in terms of gene-gene interactions underlying complex traits.
Offers a solution to explore the problematic topic of genomic data sharing. DNAdigest enables efficient and ethical data sharing. It aims to engage the research community in discussions about the problems and potential solutions, and to build tools to incentivize and increase data access and reuse in genomics research. The platform gives access to raw genomic data in order to validate research hypotheses.
Reassembles position weight matrix (PWMs) from sequence logo images with the aim of helping in the study of transcription factor (TF)-DNA interaction. Logo2PWN is available via a web application or a standalone software. It converts the image file to a common three-channel RGB formatted file, and automatically re-generates the sequence logo for the estimated PWM using enologos.
Provides a comprehensive selection of methods for the identification of important transcriptional regulators. The web service of RegulatorTrail can be used in four distinct application scenarios to either analyse gene lists, gene expression data or epigenetic data. It detects meaningful regulators that might explain the increased malignancy of metastatic melanoma compared to primary tumors as well as important regulators in macrophages.
Searches for conserved splice sites around peaks of local sequence similarity in order to identify candidate exons from which complete gene models are then constructed. AGenDA is a homology-based gene-finding program. The program takes a DIALIGN alignment of two genomic sequences as input. It is possible to apply comparative gene-finding approaches to different species at varying evolutionary distances.
Rapid and accurate detection of antibiotic resistance in pathogens is an urgent need, affecting both patient care and population-scale control. Mykrobe predictor is a generic framework extensible to many species which can identify species, resistance profile and other genomic features such as virulence elements and phylogenetic lineage, within 3 minutes on a standard laptop.
Assists in the elucidation of differences in gene structure between individuals of a species. ACE detects possible functional changes in gene structure that may result from sequence variants. The software can identify changes to gene structure that may alter the function of the resulting protein, even if that protein is highly conserved between species. It is applicable to nonhuman species (other model and non-model animal or plant species). ACE includes the splice graph random field (SGRF) model for variant-aware gene structure prediction.
Detects multiple occurrences of homologous gene clusters across multiple bacterial genomes. GCQuery is based on an algorithm that studies gene clustering in bacterial genome and perform comparative analysis of operon occurrences, gene orientations and rearrangements both within and across clusters. This software doesn’t require that genes in a cluster need the same orientation or density of genes within clusters.
Detects disease specific, non-coding risk variants. DIVAN provides a learning framework including a feature selection step using a wide library of genomic and epigenomic annotations. This step can also be used for assisting researchers in identifying biologically meaningful features. The application takes into account a large number of cell type-specific epigenomic profiles as features to accommodate the cell type-specific nature of the epigenome.
Predicts the genome build version of orphan genomic track files. Genome Build Predictor supports about 20 species. It was tested on public data from ENCODE and appears to be able to predict the correct genome build for 98.2% of the broad peak files (n = 223) for the K562 cell line.
Provides simple-to-use REST web services to query/retrieve gene annotation data. MyGene.info is designed with simplicity and performance emphasized. You can use it to power a web application which requires querying genes and obtaining common gene annotations. For example, MyGene.info services are used to power BioGPS; or use it in an analysis pipeline to retrieve always up-to-date gene annotations.
Provides a comprehensive solution for translation. Virtual Ribosome can map the underlying intron/exon structure of a gene onto its protein product. It can be used to demonstrate how the underlying exon structure is reflected in the protein. This tool enables users to submit a sequence for translation using the default parameters. It offers a large set of features that are described.
An open-source enrichment kit and comprehensive bioinformatic package for accurate, high-throughput, high-resolution HLA typing, effectively by-passing the laborious work step of PCR-based enrichment prior to NGS. To allow scientists to verify their results, a user-friendly graphical user interface is provided.
A bioinformatics web app for identifying network-based biomarkers that most correlate with patient survival data. SurvNet is a valuable bioinformatic tool for identifying network-based biomarkers that most correlate with patient survival data. SurvNet takes three input files: one biological network file as the searching platform (one human protein interaction network is provided as default), one molecular profiling file (e.g., array-based gene expression or DNA methylation data or mutation data), and one patient survival data file. Given user-defined parameters, SurvNet will automatically identify sub-networks that most correlate with patient survival data and display the results in a visually appealing manner.
Allows comparison of Infernal RNA family models. CMCws have two main functions: (1) identification of models with poor specificity; and (2) exploration of the relationship between models. This online tool supplies an interface to check the discriminatory power of proposed RNA family models. This tool can be used for detecting covariance models designed for the same structural RNA family.
Links human genes to the body parts they affect. Gene ORGANizer is built upon an exhaustive curated database that links more than 7,000 genes to approximately 150 anatomical parts using more than 150,000 gene-organ associations. The tool offers user-friendly platforms to analyze the anatomical effects of individual genes, and identify trends within groups of genes. Gene ORGANizer can be used to make new discoveries and is expected to be useful in a variety of evolutionary, medical and molecular studies aimed at understanding the phenotypic effects of genes.
Includes all the necessary functions to build and test predictors from expression data. Phenopredict is an R package that use expression data and phenotype information from any study to build a phenotype predictor. This package allows for (i) regions to be selected for the phenotype of interest, (ii) a predictor to be built for either continuous or categorical variables, (iii) the predictor in to be tested on the training data to assess resubstitution error, (iv) data to be extracted from a new data set for the same regions upon with the predictor was built, and (v) phenotype prediction in new data set.
Introduces a scalable algorithm to weight important features tuning-free without tedious cross-validation for tuning parameters. Hetero-RP is a scalable and tuning-free preprocessing framework for integrative genomic studies. It aims to seek better weights of features that match up with the auxiliary knowledge containing ‘positive-links’ and ‘negative-links’, particularly tailored for heterogeneous data from multiple sources.
Identifies origin of transfer sites (oriTs) in bacterial mobile genetic elements (MGEs) sequences. oriTfinder is a pipeline leaning on a pre-computed database of more than 1000 oriT regions with a co-localization of the flanking relaxase gene. The application can also be used for putative relaxase genes, type IV coupling proteins (T4CP) genes, type IV secretion system (T4SS) gene clusters and virulence factors (VF) genes recognition.
Assists users to perform colocalization of genomic features. Coloc-stats is a web application that offers users several alternatives methods for analyzing their contents. It also permits researchers to examine the robustness of their conclusions by comparing results across several methods. This program can be applied for studying entire track collections or pairwise relations.
A genome analysis software package. TimeZone is designed to detect footprints of positive selection for functionally adaptive point mutations. The uniqueness of TimeZone lies in its ability to predict recent adaptive mutations that are overlooked by conventional microevolutionary tools. TimeZone analyzes adaptive footprints in either individual genes or in sets of genomes. It allows three major workflows: (i) extraction of orthologous gene sets from multiple genomes; (ii) alignment and phylogenetic analysis of genes; and (iii) identification of candidate genes under positive selection for point mutations, taking into account the effect of recombination events.
Generates an evolutionary gene print (EvoP) of invariant DNA sequences as they appear in the reference DNA. EVOPRINTER superimposes multiple alignment readouts of individual reference-DNA versus test-genome alignments to proceed. It permits users to find multispecies-conserved sequences (MCSs) that are shared among three or more orthologous DNAs. This tool can be useful to understand gene regulation in all animals.
Provides inference of gene regulations using eQTLs as causal anchors by accounting for hidden confounding factors and weak regulations. Findr is a program that defines causal inference between gene expression traits. Moreover, it allows users to perform on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells.
Assists in classifying paired-end reads. Rtax is a method that permits the taxonomic classification of short paired-end sequence reads from the 16S ribosomal RNA gene. For paired-end query sequences, this procedure select those reference clusters that matched both reads simultaneously with an average percent identity within 0.5% identity of the maximum.
Exploits information related to gene function, phenotype, and diseases to rank potentially damaging alleles highlighted by variant-prioritization tools. Phevor is a web application which focuses on single-exome and family trio-based diagnostic analyses with the aim of complementing existing methods. The platform provides a term browser as well as the possibility of link human phenotype ontology (HPO) to five ontologies including the Gene Ontology (GO) or CHEBI.
Offers a method for association-detection testing of samples with related individuals from structured populations. ROADTRIPS is able to correct pedigree and population structure thanks to a covariance matrix, including admixture. It includes features for handling missing data, for including either unaffected controls and controls of unknown phenotype into the process. This tool is able to deliver an enriched analysis by incorporating data about pedigree structure of the sampled individuals.
Estimates annotation-stratified genetic covariance between traits using genome-wide association studies (GWAS) summary statistics. GNOVA provides accurate covariance estimates and powerful statistical inference that are robust to linkage disequilibrium (LD) and sample overlap. It was applied to estimate genetic correlations for 50 complex traits using publicly available GWAS summary statistics. The results show that the tool is more powerful when genetic correlation is moderate comparing to LD score regression (LDSC).
Combines sequence data and species tree information and improve gene tree reconstructions. TreeFix consists of three basic components: (i) a test of statistical equivalence to filter out gene tree topologies that are suboptimal, (ii) a gene tree and species tree reconciliation method to compute the reconciliation cost, and (iii) a tree search to explore the space of alternative gene tree topologies. Authors have compared its performance with that of several other gene tree reconstruction methods. They find that TreeFix shows drastic improvement over existing sequence-only and hybrid approaches, with performance comparable to the most sophisticated species tree aware Bayesian approaches.
Analyzes intrachromosomal heterogeneity of the periodic signal. PerScan applies the PerPlot technique in a sliding window. The main output of the software is a heat map where the level of gray in the plot area indicates the intensity of the periodic signal with the period shown on the vertical axis and the window location determined by the horizontal axis. It includes an option to mask out the protein coding sequences or noncoding sequences.
Provides efficient, platform-independent memory and file management for genome-wide numerical data. gdsfmt provides genomic data structure (GDS) file format for array-oriented data. It is useful in large-scale datasets especially for data which surpass the available random-access memory.
Computes comprehensive DNA features based on the built-in and user-defined physicochemical properties. repDNA aims to simplify the studies of DNA and nucleotides. It can be adapted and allows users to construct their own predictors. This tool is based on machine learning packages and holds very high potential for enhancing the power in dealing with many problems in computational genomics and genome sequence analysis.
An R package for phylogenetic molecular clock analyses of multi-gene data sets. ClockstaR uses the patterns of among lineage rate variation for the different genes to select the clock-partitioning strategy. The method uses a phylogenetic tree distance metric and an unsupervised machine learning algorithm to identify the optimal number of clock-partitions, and which genes should be analysed under each of the partitions. The partitioning strategy selected in ClocsktaR can be used for subsequent molecular clock analysis with programs such as BEAST, MrBayes, PhyloBayes and others. This method will be particularly useful for improving molecular-clock analyses of phylogenomic data, which are often hindered by their computational requirements.
Estimates the optimal window size for analysis of low-coverage next-generation sequence data. NGSoptwin is an R package that provides a data-based estimation of optimal window size, using Akaike’s information criterion (AIC) and cross-validation (CV) log-likelihood. The software can estimate an optimal window size that minimizes the distance between the observed reads density in the low-coverage data and the ‘true’ underlying density.
Identifies chimeric noisy long reads based on short token matches within local genomic regions. rMFilter can accelerate structure variation (SV) calling pipelines without loss of effectiveness. It is able to fast filter the reads to substantially improve the overall speed of long read-based SV calling pipelines. The first step of the tool consists in the localization of a local region which has the highest number of short token (k-mer) matches. The second step aims to reduce false negatives.
A user-friendly tool to quickly extract human genetic variation data from the latest release of the 1000 Genomes (1KG) Project. Ferret was developed as a straightforward Java application to be accessible even for non-specialists who are not adept at bioinformatics. By converting the 1KG vcf files to a format that can be read by popular pre-existing tools (e.g. Plink and HaploView), Ferret offers easy manipulation and visualization of the 1KG SNP and indel data, easy access to allelic frequency, linkage disequilibrium and haplotype information, and eventually tagSNP design.
Performs investigation of RNA editing from next-generation sequencing (NGS) data. REDItools contains three main scripts to study RNA editing using both RNA-Seq and DNA-Seq data from the same sample/individual or RNA-Seq data alone: (1) REDItoolDnaRNA.py for detecting RNA editing candidates, (2) REDItoolKnown.py for exploring the RNA editing potential of RNA-Seq experiments, and (3) REDItoolDenovo.py for performing de novo detection of RNA editing candidates. It also includes some accessory scripts and allows to annotate all candidate positions using relevant databases.
A score for alignment-free sequence comparison. D2Z is based on comparing the frequencies of all fixed-length words in the two sequences. This method gives quadratic improvement in the time complexity of calculating the D2z score, over the naïve method. The performance of the D2z score is compared to five other alignment-free similarity measures, and shown to be consistently superior to all of these measures.
Offers both functional annotations and linkage disequilibrium information for bi-allelic genomic variants (SNPs and SNVs). SNiPA combines LD data based on the 1000 Genomes Project with various annotation layers, such as gene annotations, phenotypic trait associations, and expression/metabolic quantitative trait loci.