Cox-MDR / Cox Multifactor Dimensionality Reduction method
Consists of an extension of the generalized multifactor dimensionality reduction (GMDR) to the survival phenotype. Cox-MDR is an algorithm that uses the martingale residual of the Cox regression model as a score to classify multi-loci genotype combinations into high and low-risk groups. It is able to adjust for covariate and can be extended to some types of high-dimensional data such as copy number variation (CNV) and next generation sequencing (NGS) data.
AMLGAM / Automated Machine Learning Guided Atom Mapping
Enables estimation of bond stabilities based on the chemical environment of each bond using machine learning techniques. AMLGAM is an automated optimization-based approach that finds the reaction mechanism which favors the breakage/formation of the less stable bonds. It was tested on a manually curated dataset of 382 chemical reactions and run on a large and diverse dataset of more than 7,400 chemical reactions.
MuSSeL / Multi-fingerprint Similarity Search aLgorithm
Identifies drug targets. MuSSel is a method that reports the targets which can interact with a molecule of interest and its corresponding measures of bioactivity in terms of Ki and IC50 values. This approach includes two main steps: (i) it selects drug targets biased by the query compound; ii) it predicts Ki or IC50 values towards each selected drug target. It can be used to pair novel compounds to putative drug targets or to repurpose known drugs to apparently unrelated diseases.
MAIA / Multidimensional Assessment of Interoceptive Awareness
Measures multiple dimensions of interoception by self-report. MAIA is an 8-scale state-trait questionnaire with 37 items that has been translated into 20 other languages and used in numerous studies worldwide. It consists of eight scales corresponding to its 8-factor structure: noticing, not-distracting, not-worrying, attention regulation, emotional awareness, self-regulation, body listening, and trust. It can be useful for interoception research and the evaluation of clinical mind-body interventions.
EBHiC / Empirical Bayes model for peak detection from HiC
Enables peak detection from Hi-C data. EBHiC identifies peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. This tool offers principled probability distribution estimates for Hi-C counts, and provides flexible modeling of over-dispersion by explicitly including the “true” interaction intensities as latent variables, without any restrictive parametric assumptions.
LCP / Locally Consistent Parsing
Aims to improve string processing. LCP starts from a set of patterns and proposes a two-steps approach that: (i) handles each label in the string LCP whose neighbors are not identical to it, then, (ii) handles substrings which consist of a single repeating label. It aims to then generate a data structure that supports insertion and deletion as well as being searched.
SSCMDA / Spy and Super Cluster strategy for MiRNA-Disease Association prediction
Predicts the potential miRNA-disease associations. SSCMDA is based on known miRNA-disease associations, integrated disease similarity and integrated miRNA similarity. This tool can adopt spy strategy to identify reliable negative samples from all the unknown miRNA-disease pairs, which contained mixed training samples including both potential associations and real negative samples.
GIMDA / Graphlet Interaction for MiRNA‐Disease Association prediction
Integrates the disease semantic similarity, miRNA functional similarity, Gaussian interaction profile kernel similarity and the experimentally confirmed miRNA‐disease associations in a prediction model. GIMDA described the complex relationship between two nodes based on graphlet interaction, in which both direct and indirect links between the nodes were considered. This method combined the association score of a miRNA‐disease pair calculated in the miRNA graph with the score calculated in the disease graph, which made it applicable to predict new diseases without any known related miRNAs or new miRNAs without any known related diseases.
CsreHMM / cell type-specific regulatory elements by Hidden Markov Model
Detects cell type-specific regulatory elements (CSREs). CsreHMM is an integrative and comparative method based on a hidden Markov model, that systematically reveals cell type-specific regulatory elements (CSREs) along the whole genome, and simultaneously recognizes the histone codes (mark combinations) charactering them. This method also reveals the subclasses of CSREs and labels those shared by a few cell types.
RIM / Regression-based Inference of Modulation
Performs statistical inferring of modulated gene regulation. RIM is an algorithm that infers the dynamic gene regulation modulated by continuous-state modulators. It is able to identify continuous-state modulation using a sliding window-based scheme. The algorithm was applied to genome-wide expression profiles of 520 glioblastoma multiforme (GBM) tumors, which allowed to investigate miRNA- and transcription factor (TF)-modulated gene regulatory networks and highlight their association with dynamic cellular processes and brain-related functions in GBM.
Allows biomedical information retrieval. This framework is based on learning to rank, a series of state-of-the-art information retrieval techniques. It integrates learning to rank methods into biomedical information retrieval and allows comparison of the performance of several state-of-the-art learning to rank methods. This method proposes two novel labeling strategies: (1) one focusing on constructing an optimal ranking target, and the other based on the group-wise learning to rank method.
CSRBFO / Coevolutionary Structure-Redesigned-Based Bacteria Foraging Optimization
Aims to improve the performance of bacterial foraging optimization (BFO). CSRBFO is an approach that incorporates the algorithm structure redesign, coevolutionary strategy and convergence status evaluation into the standard BFO. This algorithm consists of two main steps: (1) chemotaxis and (2) elimination & dispersal. It employs a general loop to replace the nested loop and eliminate the reproduction step of BFO.
GPSO-PG / Global Particle Swarm Optimization Personal-Best-Position Guidance
Consists of a variant of the particle swarm optimization (PSO) algorithm that maintains the population diversity by preserving the diversity of exemplars. GPSO-PG is an approach that, for each generation, divides the whole population into several groups and proposes a strategy to select a pbest, but not gbest, for guiding the whole swarm’s direction. This algorithm can consider both convergence and diversity maintenance simultaneously.
Mines large-scale chemical genomics and disease association data for prediction of novel drug-gene-disease associations. ANTENNA is a multi-rank, multi-layered recommender system that integrates a tri-factorization based dual-regularized weighted and imputes one class collaborative filtering (OCCF) algorithm with a statistical framework based on random walk with restart. It has three main components for: (1) integrating multiple chemical genomics and disease association data set, and linking them as a multi-layered network, (2) inferring genome-wide novel chemical-gene associations, and (3) predicting chemical-disease associations and assessing their reliabilities.
AMOSA / Archived Multi Objective Simulated Annealing
Permits users to choose features witch can train Linear Discriminant Analysis (LDA) models. AMOSA selects functions for the task of transcription start sites (TSS) prediction.
AMOSA-GRN / Archived Multi Objective Simulated Annealing for Gene Regulatory Networks
Solves the bi-objective optimization problem of gene regulatory network (GRN) reconstruction. AMOSA-GRN is an approach based on a multi-objective meta-heuristic algorithm named archived multi-objective simulated annealing (AMOSA). The algorithm proposes a perturbation strategy, named Get-Neighbour-MOSA for making a balance between intensification and diversification in the state space search.
Identifies carbonylation sites in human proteins. CarSite predicts carbonylation sites by position-specific amino acid propensity feature extraction in combination with the composition of k-spaced amino acid pairs, amino acid composition and the composition of hydrophobic and hydrophilic amino acids feature extractions.
CellSIUS / Cell Subtype Identification from Upregulated gene Sets
Incorporates long-range contact interactions. ICOSA correlates contact distance and orientation in pair-wise residues interactions. This tool only needs information of the backbone atoms, which makes it particularly suitable for modeling protein structures with reduced representation. In addition, it can be extended to an all-atom potential, where the icosahedral local coordinates are built to correlate orientation and distance in each atom pair interaction.
Serves for genome-wide identification of upstream open reading frames with evolutionarily conserved sequences and determination of the taxonomic range of their conservation. ESUCA is an algorithm that embeds features permitting detection of conserved peptide upstream open reading frames (CPuORFs) conserved in various taxonomic ranges. Additionally, it can be used for selecting CPuORFs to encode functional peptides.
MGRNNM / Multi Graph Regularized Nuclear Norm Minimization
Predicts potential interactions between drugs and targets. MGRNNM is a chemogenomic approach for predicting the drug-target interactions. This tool is a graph regularized version of the traditional nuclear norm minimization (NNM) algorithm which incorporates multiple Graph Laplacians over the drugs and targets into the framework for an improved interaction prediction.
ProClusEnsem / Protein Cluster Metrics Ensemble
Allows users to predict membrane protein types by fusing different modes of pseudo amino acid composition. ProClusEnsem is an algorithm that works by decomposing an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning. Additionally, this method is only suitable for binary classification problems.
Analyzes the corneal topography of the eye using a convolutional neural network (CNN). KeratoDetect is a program able to extract and learn the features of a keratoconus (KTC) eye. This tool is designed for distinguishing the differences between a normal cornea and one affected by keratoconus disease.
bestFSA / best First Search Algorithm
Assists users in performing the peak selection by defining a virtual target state and the distance from each peak to this target state. bestFSA furnishes a method for distinguishing actual glycans and can be used in practical glycan identification. Additionally, it this program can serve for the automatic detection of glycan mass spectrometry (MSn).
OGRE / Overlap Graph-based metagenomic Read clustEring
Aims to read metagenomic data. OGRE works in three steps: (1) construct an overlap graph by identifying overlaps between reads, (2) from the list of overlaps select those that are expected to be an overlap between two reads from the same species, and (3) cluster the reads that are in the same connected component in the overlap graph. Additionally, this program utilizes Minimap2 that utilizes a clever heuristic approach for the construction of an overlap graph.
MOD-CO / Meta-omics Data and Collection Objects
Consists of a model for processing sample data in meta-omics research. MOD-CO is designed for being used as the logic backbone of R&D laboratory information and may also be applicable as core schema for a transformation tool between commercial LIMS and ELN software products. Additionally, this program works with a hierarchical organization of the concepts describing collection samples, as well as products and data objects being generated during operational workflows.
RUV-z / Removing Unwanted Variation in GWAS z-score matrix
Permits researchers to execute machine learning (ML) routines on genome-wide association studies (GWAS) summary statistics. RUV-z is a program used for studying two types of prevalent confounding effects-polygenic bias and non-genetic confounders. Additionally, this program allows users to handle single nucleotide polymorphism (SNP)-level sparsity or select relevant ranks in matrix factorization.
Performs high resolution protein structure modeling. VSGB is an energy model that contains an optimized solvent model and physics-based correction terms for hydrogen bonding, π-π interactions, self-contact interactions and hydrophobic interactions. This algorithm was fit to a large database of protein single side chain and loop (11-13 residues) prediction. It was evaluated by predicting structures for a set of 115 super long loops of 14-20 residues.
DyDE / Dynamical Differential Expression
Permits users to highlight and characterize differentiated regulatory dynamics between genes to investigate mechanisms involved in nicotinamide (NAM)-induced perturbations in the circadian system of Arabidopsis. Dyde is a modeling framework that leans on a black box-type modeling approach. It can be used for formulating hypothesis for determining drug targets in complex biological systems.
Assists users in reducing metabolic modeling of complex microbial communities for analyzing experimental datasets from anaerobic digestion. RedCom consists of an algorithm used for modeling communities of up to nine organisms involved in typical degradation steps of anaerobic digestion in biogas plants.
Serves for real-time object recognition. VoxNet consists of a 3D convolutional neural network (CNN) that can be applied to create object class detectors for 3D point cloud data. This program has demonstrated during the tests that it can achieve high precision in object recognition tasks with three different sources of 3D data: LiDAR point clouds, RGBD point clouds, and CAD models.
Allows users to perform large-scale phylogeny estimation. INC consists of an algorithm designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and it can be used in a variety of settings including tree estimation from unaligned sequences, and species tree estimation from gene trees. Additionally, this program includes features for constructing different types of phylogenetic trees.
Performs global pathway similarity search based on topological and biological features. ToBio is an algorithm considering both topological and biological features for executing pathway similarity search. It exploits the topological and biological information as input features to predict similar pathways against a query pathway of interest. It also combines subgraph signatures with sequence similarity and gene ontology score to construct the input features.
Predicts microRNA-disease associations based on similarities of microRNAs and diseases. DNRLMF-MDA integrates known miRNA-disease associations, functional similarity and Gaussian Interaction Profile (GIP) kernel similarity of miRNAs, and functional similarity and GIP kernel similarity of diseases. This program is able to calculate the probability that a miRNA would interact with a disease by a logistic matrix factorization method.
Detects interdependence of peak intensity and exploits this information to quantify the existence of derivative peaks. ProbPS is a statistical model leaning on a method that correlates the presence of both derivative and complementary peaks to primary peak intensity. This approach was developed for improving both peak selection and tag identification.
pSGLD / Preconditioned Stochastic Gradient Langevin Dynamics
Predicts out-of-distribution (OoD) antibiotic resistant/non-resistant genes. pSGLD is a variation of SGLD where noise to RMSprop (a modificaiton of the stochastic gradient descent algorithm) was introduced. The algorithm classifies antibiotic resistant genes for the classes it was trained.
Predicts recurrent miscarriage (RM) thanks to high-resolution typing and the use of linear algebra on peptide binding affinity data. IMMATCH is able to create: 1) a data matrix D of size 95×9 and a label vector I; 2) a submatrix Di of size 94×25 and a sub-vector l; 3) a support vector machine (SVM) with Di to predict the probability p; and 4) repeat the process for all the couples i. This tool permits to understand the genetic causes of unexplained infertility and a gamete matching platform that could increase pregnancy success rates.
PAN / Personalized Annotation-based Networks
Allows the construction of sample-specific networks. PAN is an approach that is based on the exploitation of information from curated annotation databases to transform gene expression data into a graph. This program can generate networks where nodes represent functional terms and edges represent the similarity between them. This method can be applied for forecasting breast cancer relapse.
EMLZerD / Electron Microscopy Local 3D Zernike descriptor-based Docking algorithm
Generates a pool of candidate multiple protein docking conformations of component proteins. EMLZerD can serve for fitting multiple high-resolution structures into an electron microscopy (EM) map, which combines a multiple protein docking procedure and an assessment for fitness of the protein complex structures and the EM map using the 3D Zernike descriptor (3DZD). This algorithm requires a set of atomic resolution structures of component proteins and an EM map of the protein complex structure for determining the positions and the orientations of the component proteins.
Produces conformer ensembles of small molecules. Conformator consists of a knowledge-based algorithm including a clustering method that uses partial presorting of consecutively generated conformers. This method can sample the conformational space of macrocycles, includes several rules for sampling torsion angles as well as the ability to support SMILES and InChI format.
NMFk / Nonnegative Matrix Factorization
Investigates phase separation in a system of mixed lipids directly from the pre-processed trajectories derived by molecular dynamics (MD) simulations. NMFk provides an unsupervised machine learning (ML) algorithm based on the nonnegative matrix factorization merged with a custom clustering. It can be used to describe the lateral lipid segregation in a non-complex lipid “raft” model or to extract features from a more complex biological membrane.
Allows human pose estimation. DeepPose is a method for human pose estimation based on Deep Neural Networks (DNNs). This algorithm formulates the problem as DNN-based regression to joint coordinates and leans on a cascade of regressors, which has the advantage of capturing context and reasoning about pose in a holistic manner.
GPS / Geographic Population Structure
Serves for inference of biogeographical affinities and can be employed for the geo-localization of various human populations worldwide. GPS was applied for biogeographical analyses and localization of the ancestral origins of wild and captive gorilla genomes, of unknown geographic source.
Two-way latent / Two-Way Latent Structure Model
Markov models
DIRN / Dynamical Important Residue Network
Deformity Index
Bayesian ZIB / Bayesian Zero-Inflated Binomial
best predictor / Predicting Protein Stability Changes Upon Mutations
Cluster robustness score
drug pair knowledge
Multi-Model Inference
Predicting PKS
SDNB / Symptom-Dependency-aware Naïve Bayes classifier
Supervised dimension reduction
CDD text similarity corpus
CFI / Codon Frustration Index
DCA-based method
MALDI BeeTyping