iQSA / improved Quaternion-based Signal Analysis
Provides an improvement to the quaternion-based signal analysis (QSA) method. iQSA is an analysis algorithm for use in the feature extraction and classification phases. It consists in providing a technique for use in real-time applications, focusing on analyzing extract electroencephalography (EEG) signals online reducing the sample sizes needed to a tenth of the ones required by QSA. It results in a faster response and fewer delays to improve execution times in real-time actions.
DBSCAN / Density-Based Spatial Clustering of Applications with Noise
Assists in clustering spatial data. DBSCAN is a non-parametric, density based clustering technique. It provides features that useful when detecting objects/class/patterns/structures of different shapes and sizes. This algorithm is a good candidate to find ‘natural’ clusters and their arrangement within the data space when they have a comparable density without any preliminary information about the groups present in a data set.
CLEAR / Comparison of Laboratory Extreme Abnormality Ratio
Allows the management of all drugs, all laboratory results, and all standard nursing statements (SNSs). CLEAR is an algorithm, consisting in an electronic health record (EHR)–based pharmacovigilance method, that serves for adverse drug reaction (ADR) signal detection. It matches each drug-exposed patient to up to 4 non-exposed patients by age, gender, admitting department, and diagnosis.
PatkarEtAl2017
Offers an approach for characterize sample-specific function of a gene. The algorithm aims to investigate molecular changes related with cancer thanks to a network-based featuring. It uses guilt-by-association for increasing previous annotations. This method can be used for determining gained or lost functions in cancer relative to normal tissues as well as for investigating the gain or loss of a specific function in a gene.
NETBAGs / NETwork Based clustering Approach with Gene signatures
Provides an algorithm allowing the classification of samples according to their smoothed network profiles instead of on their gene signatures. NETBAGs aims to provide an alternative and complementary option for cancer subtyping analysis. It also permits the identification of molecular markers thanks to their significant expressed network profiles. The method was tested on several breast cancer datasets.
MultiCluster
Provides an approach for detecting three-way clustering patterns in multi-tissue multi-individual gene expression data. MultiCluster is an algorithm that proposes a multi-way clustering method based on a tensor decomposition that uses nonnegativity constraints and the sharing of information across modes to detect multi-modal specificities in this context. It was tested on simulated and on Genotype-Tissue Expression (GTEx) dataset.
HuMiTar
Provides a miRNA target prediction method. HuMiTar is an approach composed of two main steps: (i) it first determines candidate targets by investigating 3' untranslated regions (UTR) of a given mRNA; and (2) then, results are filtered thanks to a composite scoring function. The application aims to assist users in understanding translational gene regulation by microRNAs.
LREM / Logistic Regression-based Ensemble Method
Allows branchpoint (BP) prediction. LREM is an ensemble of learning scheme that integrates different features and different classifiers to build BP prediction models. The method can predict TNA BPs as well as other types of BPs. It was tested on a benchmark dataset, using 5-fold cross-validation (5CV). This approach employs nonlinear relationship to proceed and is able to return satisfying results for the branchpoint prediction tasks.
GAEM / Genetic Algorithm-based weighted average Ensemble Method
Allows branchpoint (BP) determination. GAEM is an ensemble of learning method that integrates several features and multiple classifiers to construct BP prediction models. The method can find TNA BPs as well as other types of BPs. It was evaluated on a benchmark dataset, using 5-fold cross-validation (5CV). This method is based on the utilization of linear relationship to proceed.
MinDistB
Identifies potential transmission in the context of epidemiological diseases. MinDistB fixes the distance between viral populations as the minimum Hamming distance between their representatives. It is able to take into account the sizes of relative borders of each pair of viral populations. This tool was tested on experimental outbreak sequencing data. It employs minimal distances between intra-host viral populations to proceed.
CMC / Clustering-based on Maximal Cliques
Finds complexes from weighted protein-protein interaction (PPI) networks. CMC employs maximal cliques approach to proceed. It follows three steps that consist in: (1) discovering all the maximal cliques from the weighted PPI network; (2) ranking the cliques according to their weighted density; and (3) merging or removing highly overlapped cliques. This tool is able to list all maximal cliques, by employing a depth-first search strategy.
SMC / Sequential Monte Carlo
Allows users to determine probable genealogies from an observed genetic data. SMC is an algorithm based on a sequential Monte Carlo (SMC) sampler for static models that employs the posterior distributions of population genetics parameters. It can be used with various data such as DNA sequence, protein sequence or microsatellite data. It was tested on both simulated datasets and real biological sequences.
Common Base Method
Provides a statistical method dedicated to the analysis of real-time PCR (qPCR) data for relative gene expression studies. Common Base Method provides a flexible approach that can be used to analyze unpaired and paired experimental designs as well as being adapted to analyses within the general linear model. The model aims to keep all calculations in the logscale.
DenyEtAl2017
Computes retinal ganglion cells activity. This model predicts how fast OFF ganglion cells would respond to distant, complex stimuli, and how these distant stimuli would be integrated with other stimuli simultaneously displayed inside the receptive field center.
GAT / Genscale Assembly Tool
Allows scaffolding and gap filling phases in the case of circular genomes. GAT is an algorithm, based on a version of the longest path problem solved by mixed integer linear program (MILP) modeling, that works in cases of mate-pairs and pair-ends distances. The approach consists of developing a global optimization approach where the scaffolding, gap-filling, and scaffold extension steps are simultaneously solved in the framework of a common objective function. The algorithm was tested on a set of 33 chloroplast genome data.
DILI prediction models
Aims to foresee risk of drug-induced liver injury (DILI). DILI prediction models are based on a pattern recognition algorithm Decision Forest (DF) that uses a large set of drugs named DILIrank. It can be useful during preclinical development and for reducing hepatotoxicity related attrition. This tool uses a consensus modeling of multiple decision tree models. It was tested on 1 000 iterations of cross-validations and 1 000 bootstrapping.
CNS TAP / Central Nervous System Targeted Agent Prediction
Assists users in picking a suited therapy for neuro-oncology patients. CNS TAP is an algorithm that takes into account drug properties, clinical and pre-clinical data, and patient-specific sequencing data for scoring relevant targeted agents. This program is available through a mobile application and the formula aims to be regularly updated by addition of new agents and published data.
HIGA / History Information guided Genetic Algorithm
Allows users to perform protein-ligand docking. HIGA is a running history information guided genetic algorithm. It starts its analysis by a random population initialization followed by a CE crossover, an ED mutation, a BSP tree, a local search, a selection and, finally a fitness evaluation. This method represents an extension of LGA-based algorithm modified by adding binary space partitioning (BSP), ED mutation, and CE crossover.
RFcluE / Random Forest cluster Ensemble
Allows discovery of the underlying structure of genetic data. RFcluE is a cluster ensemble approach based on an Random Forest (RF) algorithm, that addresses the problem of population structure analysis. The software is composed of two stages: (1) ensemble construction, in which an RF-based clustering method is applied to generate a set of clusters for the same dataset; (2) and consensus function, which integrates all the clusters to produce a final data clustering.
LinEtAl2017
Predicts the effect of single amino acid substitutions on enzyme catalytic activity. This method allows users to predict if a given mutation, made anywhere in the enzyme, will cause a decrease in kcat/Km value of ≥ 95%. The accuracy of this technique allows the experimentalist to reduce the number of mutations necessary to probe the enzyme reaction mechanism. It has a 2.5-fold increase in precision.
LRSSLMDA / Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction
Infers potential miRNA-disease associations. LRSSLMDA is a computational model for predicting disease-miRNA associations. It adopts sparse subspace learning with Laplacian regularization on the known miRNA-disease association network and the informative feature profiles extracted from the integrated miRNA/disease similarity networks. It was developed to make reliable predictions and guide future experimental studies on miRNA-disease associations.
BernertAndYvert2017
Allows users to perform unsupervised spike-sorting. The algorithm provides a method which processes continuously the stream of data recorded by a microelectrode and directly output trains of artificial spikes corresponding to the sorted activity of the recorded cells. This approach is based on the use of an artificial spiking neural network implementing different plasticity rules.
VISoR / Volumetric Imaging with Synchronized on-the-fly-oblique-scan and Readout
Enables high speed three-dimensional imaging of large samples at high resolution. VIsoR can achieve image acquisition of an entire mouse brain in a couple of hours with individual synaptic spines in cortical neurons visible. This method can be applied for the study of 3D structures of other biological and pathological specimen beyond brain mapping.
OTP Model / The Odorant Transduction Process Model
Offers a minimal transduction model. OTP Model provides an approach for model execution of the olfactory sensory neurons of the fruit fly spread across the antennae/maxillary palps. The algorithm incorporates features of temporal dynamics of other computational models. It consists of the active receptor model, and the co-receptor channel model.
PDE model / Partial Differential Equation model
Offers a mathematical model of movement in an abstract space. PDE model represents trajectories in the differentiation space as a graph and models the directed and random movement on the graph with partial differential equations. It was tested with scRNA-seq data to predict the early stages of pathogenesis of acute myeloid leukemia.
3D reconstruction algorithm
Rebuilds 3D models of lesioned arteries and enabled quantitative assessment of stenoses. 3D reconstruction algorithm builds a 3D computational patient-specific model of lesioned vessel, directly from 2D projections acquired while computing invasive coronary X-ray angiography, and is relevant for immediate geometric and quantitative analysis. The 3D model result is appropriate for isogeometric analysis (IGA) of blood flow in the coronary arteries.
xMaP / flexible MaP descriptor
Describes alignment-free molecules. xMap derives is based on MaP, a three-dimensional (3D) descriptor tool. This algorithm handles the fourth dimension (4D) and uses an ensemble of conformers generated by conformational searches. It functions through a five-step procedure and the most important descriptor variables are determined with chemometric regression tools. It can also display the derived quantitative structure-activity relationships.
BioSITe / Biotinylation Site Identification Technology
Detects biotinylated peptides. BioSITe utilizes antibiotin antibodies to directly capture and find biotinylated peptides in a single liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) run. It allows for a quantitative analysis that is optimal for characterizing molecular differences across different biological conditions. This tool involves biotinylation as a strategy to tag proteins or post-translational modifications.
ODE model / Ordinary Differential Equations model
Provides a model for antibody dependent cellular cytotoxicity (ADCC). ODE model can extract natural killer (NK) cells kill tumour cells from in vitro cytotoxicity data by applying perturbation methods. It is able to relate the rate at which NK cells kill in response to binding antibody with the percentage of cancer cells killed at the end of a cytotoxicity assay. This tool can predict how different initial antibody levels yield different information about the functional form of the kill rate function.
DeCarliEtAl2017
Aims to extract replication signal with high throughput. This method studies raw images with Autodetect to obtain DNA molecules’ coordinates and barcode label positions. It then makes correspondence between schematized molecules and the reference genome using IrysView and RefAligner. It couples triple-colour fluorescent labelling of DNA, replication tracks and nicking endonuclease sites with DNA combing to proceed.
Hackmann2017
Offers an approach dedicated to the management of errors in ribosomal and other DNA sequences. The method allows users to correct the similarity of DNA sequences for sequencing errors as well as to evaluate the original sequence similarity. The algorithm uses raw sequence similarity and error rates which can be calculated from Phred quality scores. It does not require clustering or presumption about which sequences are correct.
CCAFS / CCA-based feature selection
Allows users to select characteristics from multi-omics data. CCAFS is a multi-view feature selection algorithm that uses the canonical correlation analysis (CCA) learned projective transformation. The method is able to integrate high-dimentional multi-omics data. It was applied to develop integrated models intending for predicting kidney renal clear cell carcinoma (KIRC) survival.
APC Method / Atom Pair Contribution Method
Provides an atom pair contribution (APC) model. APC Method can predict the formation enthalpies of organic molecules in gas phase via its APC additivity scheme. This algorithm is based on increments associated with pairs of bonded and geminal atoms, along with 15 structural corrections. It owns also a large amount of experimental and theoretical data compiled for this work to validate the model.
MAXDo / Molecular Association via Cross Docking
Furnishes a systematic rigid body docking algorithm. MAXDo is an approach derived from the ATTRACT protocol and uses a multiple energy minimization scheme. It can be used for systematic docking simulations. This tool is able to predict correctly the protein complex structures as the position of the ligand protein with respect to its receptor.
BOINC / Barcoding of Individual Neuronal Connectivity
Enables the conversion of high-throughput sequencing (HTS) data into neuronal connection matrix. Based on the the Barcoding of Individual Neuronal Connectivity (BOINC) protocol, the proposed computational framework builds a connectivity map at a single neuron resolution using barcoded neurons by a pseudorabies virus (PVR). This method works on co-cultured neurons and is a microscopy-free neuro-connectivity method.
MixMD / Mixed-solvent Molecular Dynamics
Accounts for the interaction of both water and small molecule probes with a protein’s surface. MixMD recognizes conserved and displaceable water sites. It employs an occupancy-based metric to find sites which are consistently occupied by water even in the presence of probe molecules. This method is able to determine which functional groups are capable of displacing which water sites. It stores the displacement of water sites by common functional groups.
hMuLab
Allows multi-label classification. hMuLab is a multi-label modeling classifier that simultaneously integrates the feature information and neighbor label information and employs a multi-output regression model with regularization for multi-label prediction. The algorithm can produce multiple label assignments for a query sample. It was evaluated using three biomedical multi-label datasets, which are representative for their specific patterns.
GtTR / Genotyping Tandem Repeats
Genotype tandem repeats (TRs) at population scale. GtTR is a probabilistic algorithm that genotypes TRs from short read sequencing data (targeted capture sequencing or whole genome sequencing) by comparison of regional read-depth with a single long-read reference sample. The algorithm can be used in combination with short read sequencing technologies to assess TR variation at a population scale. It was applied to Illumina targeted sequencing data to determine the genotypes of the targeted variable number tandem repeats (VNTR) regions.
LigBEnD / Ligand-Biased Ensemble Docking
Provides a hybrid ligand/receptor structure-based docking. LigBEnD was developed by incorporating the atomic property field (APF) method into structure-based ensemble docking. This method assumes the following: (1) compounds that are similar to co-crystallized ligands are likely to bind in a similar pose, (2) compounds that are chemically dissimilar to co-crystallized ligands might share similarity in the properties of atoms occupying the same 3D space and (3) compounds belonging to the same chemical class should have consistent, similar poses.
SCARE / SCan Alanines and REfine algorithm
Docks a ligand to receptor structure by homology represented by a single conformer. SCARE is an induced fit docking algorithm based on 4 steps: it (i) produces multiple variants of receptor pocket, (ii) docks flexible ligand to each variant of the receptor pocket and record best scored poses, geometrically cluster them and select best scoring position, (iii) restrains ligand for each of unique ligand poses and (iv) re-scores all optimized ligand-receptor pairs and select top scoring pose.
LiuEtAl2018
Proposes a method for rebuild task-related sources. The application proposes electroencephalography (EEG) source imaging model based on temporal graph regularized low-rank representation composed of: (i) data fitting term, (ii) temporal graph embedding regularization term, and (iii); a ℓ1 norm for sparsity penalty and nuclear norm. This model is solved by an algorithm using the alternating direction method of multipliers (ADMM) that is able to extract low-rank task-related source patterns.
STRAPS / Spatio-Temporally Regularized Algorithm for m/eeg Patch Source imaging
Offers a method for magnetoencephalography and electroencephalography (M/EEG) patch source imaging on high-resolution cortices. STRAPS is state-space modeling and estimation algorithm that uses local spatial-temporal constraints for estimating cortical sources. The algorithm was tested on both synthetic electroencephalography (EEG) data the numerical simulations and real Magnetoencephalography (MEG) data analysis.
WangEtAl2018
Realizes link prediction for general directed or undirected complex networks, regardless of their technological, biological, or social nature. This method considers the adjacency matrix of a network as the pixel matrix of a binary image. It treats links as following: present links are corresponding to pixels of value 0. This approach can build fake images that are employed as training set for link prediction in the original image.
MITOMIX
Optimizes the global minimal genetic distance in a given population. MITOMIX employs the shared haplogroup distance (SHD) method to differentiate closely related and/or admixing populations. It can recognize all traces and the extent of admixture, offering a substitute to whole genome admixture analysis. This tool can be employed to explain the observed mtDNA haplogroup (mt Hg) distributions of any studied population.
GuptaEtAl2018
Retrieves structured information from free text clinical narratives. This approach starts by the detection of the named entities of interest and their relations. It then builds “information frames” from the each extracted item. This method is based on a natural language processing (NLP) system and unsupervised machine learning techniques. It was applied in the particular context of mammography reports.
Shrinkage Clustering
Identifies optimal number of clusters while partitioning data. Shrinkage Clustering is a non-negative matrix factorization (NMF) based method that optimizes cluster memberships while simultaneously shrinking the number of clusters to an optimum. This algorithm generates clusters of sufficiently large sample sizes as required by the user and can clusters applications with minimum cluster size constraints.
MPH method / Molecular Process Heterogeneity method
Analyzes transcriptomic data from cellularly homogeneous sample to define functional heterogeneity. MPH method quantifies the functional heterogeneity of homogeneous cell population based on transcriptomic data. This tool combines molecular process and proportions with an approximation of the differences in in gene expression levels for cells. This algorithm employs a non-negative matrix factorization (NMF) method.
ZhangEtAl
Facilitates latent disease-gene association discovery in literature mining. This algorithm provides insights into the relationship between cellular and molecular processes and diseases. It discovers novel disease mechanisms via the integration of topic modelling and network decomposition techniques. This tool is built on Latent Dirichlet Allocation (LDA) modelling to find significant disease tropics based on multiple disease-gene associations mined form publications.
QC-RFSC / QC-based Random Forest Signal Correction algorithm
Removes unwanted variations at feature-level in large-scale metabolomics and proteomics data. QC-RFSC is an algorithm that integrates the random forest (RF) based ensemble learning approach to learn the unwanted variations from quality control (QC) samples. It also predicts the correction factor in the neighboring real samples responses. Beside metabolomics data analysis, this method significantly improves the data quality for the proteomics.
MaREA / Metabolic Reaction Enrichment Analysis
Assists users in investigating cancer metabolism when data on metabolic measurements are not available. MaREA can be used to: (i) rank the reactions according to the variation in their activity observed between different phenotypes and/or experimental conditions, (ii) enrich the map of human metabolic routes with the variation observed in the Reaction Activity Score (RAS) of each reaction, and (iii) stratify samples according to their metabolic activity.
Scluster
Finds cancer subtypes. Scluster is based on an adaptive sparse reduced-rank regression (S-rrr) method. It builds a fused patient-by-patient network and then applies a spectral clustering method for the identification of cancer subtypes. This tool can deal with high dimensional statistical data under the Gaussian variable model. It recognizes the principal low-dimension subspace and can be run without data preprocessing step.
BakiriEtAl2018
Allows deconvolution of complex nuclear magnetic resonance (NMR) spectra of metabolite mixtures. This approach exploits heteronuclear multiple bond correlation spectroscopy (HMBC) and heteronuclear single quantum correlation spectroscopy (HSQC) correlation data to accelerate the identification of natural metabolites. It can assist to initiate the chemical profiling of natural extracts. This method permits to cross-validate the results obtained.
tetraBASE / tetrahedron-based backbone statistical energy model
Realizes realistic modeling of through-space packing of polypeptide backbones. tetraBASE derives statistical energies from known sequence and structural data of native proteins and their complexes. It can consider the effects of peptide local conformation, local structural environment as well as inter-residue geometries on amino-acid sequences. This method provides a representation of inter-backbone site packing geometries.
PSCPP / Protein Side-Chain Packing Problem
Estimates the side-chain conformation of every protein’s residue. PSCPP aims to find a set of rotamers from a rotamer library that minimizes the given scoring function. The method is composed of different parts: (1) a rotamer library, (2) a scoring function (SF), and (3) a search algorithm. It realizes a relaxation process through a molecular dynamics simulation considering only the asymmetric unit of monomeric proteins surrounded by water to take into account a realistic environment for the protein.
NLM-CHEMSORT
Arranges chemical names. NLM-CHEMSORT is based on a method consisting of a primary sort key of over 80 alphabetic characters and a 16-alphanumeric characters secondary level sort key generated from the chemical name. These de novo sort keys don’t need increased permanent-storage costs and allows results in logical sequences of chemical names. This algorithm is suitable for obtaining chemical names from smaller files such as the Toxicology Data Bank.
Clote2005
Computes the number of locally optimal secondary structures of an RNA molecule with respect to the Nussinov–Jacobson energy model. The publication offers an efficient algorithm that was applied to analyze the folding landscape of selenocysteine insertion sequence (SECIS) elements, hammerhead ribozymes from Rfam, and tRNAs from Sprinzl’s database. Applications of this algorithm extend knowledge of the energy landscape differences between naturally occurring and random RNA.
PIE / Processing Images Easily
Tracks automatically colonies of the yeast Saccharomyces cerevisiae in low-magnification brightfield images by combining adaptive object-center detection with gradient-based object-outline detection. PIE integrates a colony-tracking procedure that allows tracked colonies to be joined across subsequent time points. This permitting simple calculation of growth rates, as well as simultaneous tracking of other colony properties (e.g. fluorescence).
MaxEnt / Maximum Entropy
Deconvolutes complete electrospray spectra of protein mixtures. MaxEnt is a program that can produce zero-charge mass spectra on a molecular mass scale. It automatically connects data peaks of different charge state, with no need for explicit identification. Furthermore, quantitative relative intensity data can be derived from the areas under the peaks in the MaxEnt output.
BernEtAl2018
Consists of a parsimonious charge deconvolution algorithm that produces fewer artifacts. This algorithm is well-suited to high-resolution native mass spectrometry of intact glycoproteins and protein complexes. It simplifies the utilization of native mass spectrometry for the quantitative analysis of protein and protein assemblies, and can deconvolve monomer and dimer simultaneously.
PLPD / Protein Localization Predictor based on D-SVDD
Predicts protein localization. PLPD can detect the likelihood of specific localization for a protein by using the Density-induced Support Vector Data Description (D-SVDD). D-SVDD is extended for this algorithm to run the prediction of protein subcellular localization. It utilizes three measurements for the assessment and to refine the protein localization predictor. PLPD approach is complimentary to other method such as the nearest neighbor or the discriminate covariant method.
SozRank / Seizure Onset Zone Ranking
Identifies epileptic seizures. SozRank constructs an empirical distribution of the scores calculated over random blocks recorded while the patient is resting to proceed. It is based on a combination of a parametric causality measure, Granger causality, and a non-parametric measure, directed information. This tool is able to quantify the pair-wise causal influences between the recordings.
PopKLD / Population Kullback-Leibler Divergence
Summarizes the raw, continuous, inherently noisy, outlier-ridden, biased electronic health record (EHR) data in a high throughput setting. PopKLD aims to reduce the amount of human effort necessary to clean and summarize the data. It employs a non-parametric probability distribution estimate to proceed. This method creates an estimate of the mean and variance for every individual.
FaEtAl2018
Predicts protein function using multi-task deep neural networks (MTDNN). This algorithm tackles the multi-label problem. MTDN can learn both a shared feature representation from all Gene Ontology (GO) terms and specific patterns from individual terms by employing two stacked multi-layer structures, one shared by all tasks and another one specific to each task on top of the shared on.
Parallel Dual NeuInf
Deduces the functional neural network and synaptic connections. Parallel Dual NeuInf is based on the kernel method to map the nonlinear inference problem to a linear equivalent in the kernel space. It scales to large datasets of recorded neural activities. This tool can deal with both deterministic and stochastic LIf neurons through the same framework.
NSforest
Recognizes the set of necessary and sufficient marker genes from an sc/snRNAseq experiment. NSforest is based on a random forest of decision trees machine learning approach. It creates standard cell type definitions. The result of this method can serve as a reference knowledgebase to support interoperability of information about the role of cellular phenotypes in human health and disease.
MiLeSIM / Machine Learning Structured Illumination Microscopy
Allows users to investigate the structure of large populations of viruses. MiLeSIM measures virus population heterogeneity by coupling methods of super-resolution microscopy (SRM), machine learning (ML)-based classification and image analysis. The application is suited for examining large numbers of particles with high specificity. It can be applied to virus-based biotechnology industry such as vaccine development.
FilterBlink
Deletes the vertical electrooculogram (VEOG) artifacts from the electroencephalography (EEG). FilterBlink subtracts the grand-average of all detected VEOGs from the respective channel from an EEG segment when it shows sufficient correlation with the template. It provides a threshold of similarity between template and EEG segment that allows the user to set the sensitivity and specificity of the filter.
SeqStruct
Recognizes orthologs that are not identifiable with the standard BLOSUM matrices. SeqStruct derives a universal amino acid similarity matrix that incorporates the effects of a large set of pairwise substitutions. It can offer a strong agreement between structure and sequence similarities. This tool aims to improve the results in many applications of sequence matching in biology, advancing particularly the fields of molecular, structural and evolutionary biology.
BEAD / Bead-conjugated EV Assay Detected
Studies extracellular vesicle (EV) proteins from human samples. BEAD exploits biotinylated EVs captured on streptavidin-coated polystyrene (PS) beads. It can: (1) improve EV capture efficiency due to the high affinity biotin-streptavidin interaction; (2) offer a simplified assay measured using conventional flow cytometers; (3) enhance detection sensitivity due to EV (and biomarker) concentration on polystyrene (PS) beads.
mSTARR-seq
Evaluates the causal relationship between DNA methylation and regulatory activity within a cellular context. mSTARR-seq quantifies enhancer activity via self-transcribing episomal reporter assays with enzymatic manipulation of DNA methylation at millions of unique CpG sites. It is able to recognize the CpG sites for which DNA methylation variation is most tightly linked to gene expression variation in human primary cells.
CALM / the Causal ReguLatory Modules
Identifies the microRNAs-messenger RNA (miRNA-mRNA) causal regulatory modules. CALM is an algorithm that consists of three steps: formation of the causal regulatory relationships of miRNAs and genes from gene expression profiles, detection of the miRNA clusters according to the GO function information of their target genes, and expansion of each miRNA cluster by adding the target genes to maximize the modularity score. It was applied on four databases (EMT, breast, ovarian and thyroid cancer).
RefHap
Enables single individual haplotyping. RefHap finds the best cut based on a heuristic algorithm for max-cut and then builds haplotypes consistent with that cut. The algorithm is able to perform whole chromosome haplotyping. It was tested with preliminary real data from fosmid-based sequencing and several simulation experiments were performed for testing the behavior of ReFHap under a wide range of circumstances.
FastHare
Allows single individual single nucleotide polymorphism (SNP) haplotyping reconstruction. FastHare is a heuristic whose input is a SNP-fragment matrix M with n rows and m columns, together with a parameter that will be introduced in the sequel. The output consists of three objects: two haplotypes of length n, a SNP-fragment matrix M with m columns and n rows, and a partition of the rows of M (fragments) into two groups.
SVR_CAF
Allows users to predict structural similarity between native structures and their decoys. SVR_CAF is a scoring function for protein structure selection that proposes a machine learning score. It sorts the structures by using energy considerations and network topology characters. The application incorporates three normalized scores: residue contact energy, the amino acid network and the fast Fourier transform.
MVP / Missense Variant Pathogenicity
Assists users in leveraging large training data sets and many correlated predictors. MVP is a prediction method that was developed to improve Missense Variant Pathogenicity prediction. It uses many correlated predictors, broadly grouped in four categories: (a) variant level, (b) gene mutation intolerance, (c) protein structure and modification and (d) published scores from others selected methods.
LP / Linkage Probability
Serves as a computationally-tractable method to assess linkage disequilibrium. LP estimates the posterior probability of a relation between two categorical data sets and detects potential biases from latent variables. This algorithm allows users to reveal and estimate 3D steric effects in 1D single-nucleotide polymorphism (SNP) data. It constitutes the basis of several software tools such as INTERSNP, Haploview or PLINK.
YiEtAl2018
Predicts microvessel in Hematoxylin and Eosin (H&E) stained pathology image. This method consists of a deep learning algorithm that can be applied in different types of cancer. It can serve to investigate the role of micro blood vessels and study their role in tumor progression and treatment response from public datasets. This approach has been used in a real patient cohort. It is able to determine patient clinical outcome.
PARDIS / protein’s PAtch Resolves DImer Stability
Determines protein-dimers stability from the properties of protein binding patch (PBP) on a single subunit. PARDIS is based on a nonlinear analysis of binding-patch properties and employs a database of protein complexes with known complex stabilities to proceed. It can assist users to predict the stability of large populations of determined protein-protein interaction (PPI) networks to design suitable complex-stability modulators.
OMEGA
Creates conformers with a prebuilt library of fragments and a knowledge base of torsion angles. OMEGA can sample the conformational space around solid-state structures of druglike molecules. It is able to recreate experimental structures, decreasing the size of the conformer ensemble needed for good reproduction and the run time required.
CAESAR / Conformer Algorithm based on Energy Screening and Recursive Buildup
Discovers the X-ray conformation and covers the pharmacophore space. CAESAR is based on a divide-and-conquer approach: recursive decomposition of a molecule into the smallest units followed by recursive buildup of molecular conformations from the smallest units. It can reproduce the receptor-bound X-ray conformations. This approach integrates energy pruning on fine torsion grids instead of the geometry optimization method.
Pathfinder
Aims to estimate the causal single nucleotide polymorphism (SNP) and the causal mark within a gene region that are influencing expression of a given gene. Pathfinder models the hierarchical relationships between genome, chromatin, and gene expression. It is based on a hierarchical statistical method. This method generates well-calibrated posterior probabilities and can prioritize SNPs and marks for functional validation.
EM algorithm
Allows users to calculate maximum likelihood estimates from incomplete data.
HSNE / Hierarchical Stochastic Neighbor Embedding
Aims to visualize meaningful landmarks which represents sets of high-dimensional data points. HSNE can process 3D MSI data at full spectral and full spatial resolution. This algorithm builds scatter plots showing the distribution of the landmarks based on the similarity of their mass spectral profiles, in the full high-dimensional space.
FastEtch
Permits users to assemble genome by using the sublinear streaming data structure, Count-Min (CM) sketch. FastEtch computes the assembly based on an approximate, space-efficient version of the Bruijn graph, storing only a subset of vertices. This algorithm can also detect edges on the fly as contigs are generated which creates savings on both vertex and edge space requirements.
S-GSOM / Seeded Growing Self-Organising Map
Automatically identify clusters in the feature map using the already-available labelled samples (seeds). S-GSOM is an algorithm that consists of three core procedures: (1) the very small amounts of available or selected seeds are combined with other unlabeled samples; (2) the combined samples are presented to GSOM for training in which the seeds are treated the same as the unlabeled data; and (3) S-GSOM performs an extra phase, the cluster identification phase, as post-processing.
CASINO / Community And Systems-level INteractive Optimization
Serves for analysis of microbial communities through metabolic modeling. CASINO performs iterative multi-level optimization to calculate the relative uptake of carbohydrates by each species. This tool can be used to quantify the community interactions and the relative glucose uptake by the individual species. Moreover, this tool allows inclusion of several species in the simulations.
PAcIFIC / Precursor Acquisition Independent From Ion Count
Assists in processing precursor acquisition. PAcIFIC can process entire, predicted and soluble bacterial proteome without the need for any sample fractionation except than the C18-based liquid-chromatograph. It can reduce sample preparation to a minimum prior to fully automated liquid chromatography - mass spectrometry (LC-MS) and MS operation. This method is adaptable with standard instrumentation and software.
NBOR / Neighborhood-Based ORdering of single cells
Establishes the position of a given cell in developmental continuum. NBOR measures the similarity of each single cell’s gene-expression profile to a defined gene set of a particular cell population. It arranges then each cell according to the similarity score into a spatial continuum around the cell population. This tool can be used to create an unsupervised visualization of the single-cell mRNA profiles into a linear developmental order.
SOMSC / Self-Organization-Map for high-dimensional Single-Cell data
Identifies cellular states, reconstructs cellular state transition paths and builds the pseudotime ordering of cells. SOMSC proceeds by following six main steps: (1) measuring a topographic chart of single cell data, (2) identifying basins of the topographic chart, (3) organizing the cellular states and building their transition paths, (4) constructing the cellular state map for all cells, (5) detecting the state-driven genes and ultimately (6) estimating the cellular state replication and transition probabilities.
HRDetect
Predicts BRCA1/BRCA2 deficiency in cancer. HRDetect is a whole genome sequencing (WGS)-based predictor for detection of homologous recombination (HR)-deficient tumors. The model was assessed using independent cohorts of breast, ovarian and pancreatic cancers. It was applied to a cohort of 560 breast cancer patients with 22 known germline BRCA1/BRCA2 mutation carriers and identified an additional 22 somatic BRCA1/BRCA2 null tumours and 47 tumours with functional BRCA1/BRCA2-deficiency where no mutation was detected.
MSIGNET / Metropolis sampling based SIGnificant NETwork
Identifies a global optimal significant network by integrating gene expression data and protein-protein interaction (PPI) network. MSIGNET is a method developed to identify significant network with genes significantly over or lower expressed in a certain condition. The method was applied to real data, including one Parkinson patient data set and another two ovarian cancer patient data sets. It can be applied to any network without size limitation.
BSSV / Bayesian based Somatic Structural Variation
Calculates significance p-value for each somatic structural variation (SSV) by comparing the read alignments in tumor sample to those observed in normal sample. BSSV is a Bayesian based SSV detection method developed to identify cancer specific genomic changes. Significance p-value calculated by BSSV for each SSV can provide biologists a ranked list for further experimental validation. The method was tested on both simulated and real data sets.
SPoC / Source Power Comodulation
Relates electro- and magnetoencephalography (EEG/MEG) data to a given target variable. SPoC can discover a spatial filter that extracts an oscillatory signal whose power modulation follows a given target variable. It solves the problem of component extraction for band power correlation/covariance. This tool is useful to extract information from auditory sources generating steady-state responses.
YuEtAl2014
Predicts protein complexes in protein-protein interaction (PPI) networks. This algorithm assists users to discover the complexes in the protein interaction networks (PIN) by learning from true complexes. In this method, the semantic similarity between two proteins is calculated based on the annotation size of the gene ontology (GO) term on which both proteins are annotated.
semi-supervised learning method
Assists users to detect the protein complexes. This algorithm consists of a heuristic method that utilizes multiple features that define protein complexes in protein-protein interaction (PPI) networks. This program can be used for evaluating the candidate subgraphs, and if the evaluating value exceeds the threshold, the candidate subgraph is predicted to be a complex.
NeoDTI / NEural integration of neighbOr information for DTI prediction
Forecasts drug-target interactions (DTIs) from disparate data. NeoDTI proposes a method that leans on a nonlinear feature mining by neural networks. It enables several information passing and aggregation operations, updates the node embedding, for performing a topology-preserving learning of the node embedding. The application aims to assists users in drug discovery.
HT-eQTL / High Tissue expression Quantitative Trait Loci
Analyzes multi-tissue expression quantitative trait loci (eQTL). HT-eQTL provides a scalable computational method built upon an empirical Bayesian framework that was set to enable scaling issue associated with a large number of tissues. The algorithm can feature tissue specificity and can be used for increasing the discovery of genetic regulatory pathways underlying complex diseases.
multiOrthoAlign
Deals with duplications, losses and rearrangements for alignment of a set of gene orders related through a phylogenetic tree. multiOrthoAlign is based on a heuristic generalization of OrthoAlign, a developed pair-wise alignment algorithm. This software can be extended and applied to other rearrangement operations like substitutions, insertions, tandem or inverted duplications.
csuWGCNA / Combination of Signed and Unsigned Weighted Gene Co-expression Network Analysis
Captures negative correlations by combining signed and unsigned weighted gene co-expression network analysis (WGCNA). csuWGCNA is more efficient to capture negative miRNA-target and IncRNA-gene pairs by using two gene expression profiles. This algorithm can pinpoint modules containing genes with negative correlations.
MAESTER / Moneyball Approach for Estimating Specific Tissue adverse Events using Random forests
Predicts the probability of a compound presenting with different tissue-specific drug adverse events. MAESTER is based on a data-driven machine learning approach and incorporates information on a compound’s structure, targets, and downstream effects. This algorithm combines compound and target properties to predict the likelihood of events. It can directly predict clinical effects and can be improved to predict patient specific adverse events.