GABNI / Genetic Algorithm-based Boolean Network Inference
Allows users to infer generalized regulatory relations. GABNI is an algorithm designed for deducing the interaction type by examining the binary expression values of a target gene and a regulatory gene in the binary gene expression data. This method can be used for large-scale inference problems in terms of both structural and dynamics accuracies.
WangEtAl2018
Deduces target gene expression profiles using a conditional generative model. This approach stabilizes the adversarial training using lambda1-norm loss on the gene regression model. It is based on generative adversarial networks (GAN) to build large dimension outputs with no spatial structure. This tool provides robust predictions to the outliers and enables the capture of the low frequency structure of samples.
NadianEtAl2018
Automates spike sorting. This algorithm is based on a merging of two methods: the distributed stochastic neighbor embedding (t-SNE) and the density-based spatial clustering of applications with noise (DBSCAN). This application can be used with an important set of simultaneously recorded units. It was tested with simulated 10 minutes-long extracellular recordings as well as with real multi-electrode array neural recordings.
SIFT / Spherical Deconvolution Informed Filtering of tractograms
Quantifies the density of underlying white matter fibers. SIFT performs fiber orientation distribution (FOD) segmentation, and assigns streamlines to the FOD lobes they traverse, based on both the voxels they pass through and their tangent through each voxel. It combines the quantitative properties that can be assigned to a whole-brain reconstruction to sample the structural connections emanating from a region or regions of interest.
TiwariEtAl2018
Retrieves 3D structural models in libraries of biological shapes. This approach compares some experimental image data to the projection images from existing structural data to work. It can resize the 3D models to determine the volumetric size, allowing for the possibility for a small novel protein to have a similar shape to a large protein complex. This tool can recognize possible shapes for novel single particle and estimate the number of conformations that can be present in experimental data.
LowEtAl2018
Allows characterization of population activity as a trajectory on a nonlinear manifold. This algorithm captures correlations between neurons and temporal relationships between states, constraints arising from underlying network architecture and inputs. Furthermore, it can find broader use in probing the organization and computational role of circuit dynamics in other brain regions.
Modular inverse reinforcement learning
Permits estimation of both rewards and discount factors from human behavioral data. 'Modular inverse reinforcement learning' consists of an algorithm that enables predictions of human navigation behaviors in virtual reality across different subjects and with different tasks. Moreover, it supplies a strategy for estimating the subjective value of actions and how they influence sensory-motor decisions in natural behavior.
SQUICH / SeQUential DepletIon and enriCHment
Consists of a method for molecular sampling. This approach uses computations performed by molecular ensembles to encode the abundance of each species in a sample before measurement. It can quantify each of a large number of species of molecules in a pool. It is useful for measuring massive single-cell RNA profiles. This algorithm enables logarithmic or even sub-logarithmic sampling for precision desired in ubiquitous sequencing applications.
JiEtAl2018
Automates the classification of cells into cell-types. This method combines prior knowledge with observed cytometry data to proceed. It is based on a Bayesian solution, allowing users to integrate biologically-meaningful prior information that captures the domain expertise of human experts. This approach returns individual cells hierarchically-structured, that model the tree-structured recursive process of manual gating.
muMAPseq / multisource Multiplexed Analysis of Projections by sequencing
Determines mesoscale connectivity networks in individual animals. muMAPseq constructs relevant mesoscale connectivity atlases for individual labs’ particular model system. It offers a systematic foundation for investigating circuits in mouse models in which connectivity deviates from that of C57BL/6J males. This method is useful in a wide range of non-standard animal model systems, including peromyscus, voles, marmosets and others.
MajumderEtAl2018
Measures the activity of daptomycin on Streptococcus aureus strains with different membrane compositions. This method can determine activity of daptomycin on Streptococcus aureus strains with different membrane compositions. It employs an artificial neural network (ANN) model to give a relation between membrane composition and activity. This approach takes into consideration the effect of the same drug candidate on multiple membrane compositions.
ChaiEtAl2018
Provides a logistic regression model combining semi-supervised learning and active learning for disease classification. This algorithm does not require significant engineering overhead to process and it uses unlabeled gene expression samples in disease classification to obtain results. Its regression model is based on the complementarity of semi-supervised learning and active learning. It can also minimize the false pseudo-labeled samples via an update pseudo-labeled samples mechanism embedded in the method.
HeEtAl2018
Consists of a two-stage biomedical event trigger detection approach. This method includes two subtasks: trigger recognition and classification. It is able to alleviate the problem of class imbalance, and different features are selected in each stage. This approach also integrates word embeddings for representing words semantically and syntactically. It was evaluated on the multi-level event extraction (MLEE) corpus test dataset.
Covariate Assisted Principal regression
Identifies the components predicted by linear models of the covariates. Covariate Assisted Principal regression is an algorithm for multiple covariance matrix outcomes. This method avoids the massive number of hypothesis testing suffered in the element-wise regression approach. Applied to resting-state functional magnetic resonance imaging data, this approach identifies the human brain network changes associated with age and sex.
OPLRAreg
Assists users to develop quantitative structure‐activity relationship (QSAR) models. OPLRAreg is a piecewise linear regression algorithm that can determine features to separate the data into regions and detect linear equations to predict the outcome variable in each region. This algorithm is designed to permit researchers to add customized constraints to the model.
AIM-SNPtag
Chooses the most membership informative single nucleotide polymorphisms (SNPs) that can be potentially applied to forensic science. AIM-SNPtag can find ancestry-informative markers (AIMs) for ancestry or membership inference. It was assessed with the Monte Carlo cross-validation procedure. This tool can be applied to multiple-population genome-wide SNP data. It is useful for deducing an individual’s continental or biogeographic origins.
S-QFC / Secure Quaternion Feistel Cipher
Offers a method dedicated to the encryption of DICOM images. S-QFC is a quaternion encryption algorithm based on a modified Feistel network with a modular arithmetic in the quaternion field. This approach intends to improve the efficiency of the security process by the exploitation of a both-sided, modular matrix multiplication coupled to the use of quaternion Julia sets and of a fractal division process.
WangEtAl2018
Investigates the noise performance of short-pulse lasers using dynamical methods. This approach is useful for optimizing the design of short-pulse lasers. It employs a parabolic gain profile for the amplifier. This tool is useful for characterizing a laser that is locked using a fast saturable absorber and a laser that is locked using a slow saturable absorber.
SSC-LRR / self-training Subspace Clustering algorithm under Low-Rank Representation
Assists users for cancer classification on gene expression data. SSC-LRR consists of an algorithm that integrates self-training subspace clustering (SSC) and low-rank representation (LRR). It considers the three characteristics of gene expression data: the high-dimensionality, the small sample size, and the existence of unlabeled data. Moreover, this algorithm is composed of self-training technique to exploit information from unlabeled gene expression data.
iSCHRUNK / in Silico approach to CHaracterization and Reduction of UNcertainty in the Kinetic models
Permits to characterize uncertainties and uncover intricate relationships between the parameters of kinetic models and the responses of the metabolic network. iSCHRUNK combines parameter sampling and machine learning techniques. It allows users to identify a small number of parameters that determines the responses in the network regardless of the values of other parameters.
IUP / Iteratively Updated Priors
Executes successive personalisations of the cases in a population in large databases. IUP performs successive personalisations through maximum a posteriori (MAP) where the prior probability at an iteration is set from the distribution of personalized parameters in the database at the previous iteration. This leads the parameters to lie on a reduced linear subspace dimension in which for each case of the database there is a possibly unique parameter value for which the simulation fits the measurements.
DuboseEtAl2018
Consists of retina layer-specific statistical intensity models of the optical coherence tomography (OCT) images. This approach presents physically derived and empirically validated layer-specific statistical models of the intensity in retinal OCT images, which were used to calculate the unbiased and biased Cramer-Rao lower bounds (CRLB) for estimating the layer boundary locations in retinal OCT images. These statistical models can serve for improvements to OCT image denoising, reconstruction, and other applications.
CAUSAL-Imp
Computes summary statistics for unobserved single nucleotide polymorphisms (SNPs) by conditioning on the statistics of the observed SNPs and given causal status. CAUSAL-Imp combines the principle of fine mapping and summary statistics imputation. It can impute the association statistics at untyped variants while taking into account variants in the region that may affect the trait. This method considers all the possible causal statuses where any subset of SNPs can be causal.
VSRFM / Variants Stacked Random Forest Model
Predicts the effect of variants. VSRFM is a stacked meta learner for deleteriousness classification, that was built using supervised machine learning over a composed data set which contains pathogenic and benign variants, obtained selecting unique variants from five benchmark datasets HumVar, ExoVar, VariBench, predictSNP and SwissVar. This model was constructed for variants not involved in splicing. It can be useful for improving pathogenic mutation detection.
VSRFM-s / Variants Stacked Random Forest Model for splicing
Allows users to perform pathogenic prediction. VSRFM-s consists of a variants stacked random forest model for variants affected by splicing. It was built using supervised machine learning over a composed data set which contains pathogenic and benign variants, obtained selecting unique variants from five benchmark datasets HumVar, ExoVar, VariBench, predictSNP and SwissVar. This program can be used for deleteriousness classification and for improving pathogenic mutation detection.
EDES / Ensemble-Docking with Enhanced-sampling of pocket Shape
Exploits short metadynamics simulations of the apo protein of interest to generate a set of druggable (holo-like) conformations. EDES was developed with the set of collective variables to sample in a controlled manner maximally different shapes of the binding site, and a multi-step clustering strategy allowing to retain a large fraction of holo-like structures within the pool of cluster representatives. This method can be employed in ensemble-docking.
POET / Population Outcome Enrichment Technique
Reveals subpopulations where the pharmacological response between compounds agree and diverge. POET is a population segmentation algorithm consisting of an unsupervised machine learning technique. This method discovers subpopulations of cell lines in which two or more compounds, possibly addressing the same disease state or targeting the same genetic alteration, have a common pharmacological pattern of response. It was able to integrate multiple measures of drug response to identify subpopulations that differentiate response to inhibitors of the same or different targets.
MCE / Markov Chain Entropy
Aims to represent the potency of a sample with a single-cell RNA sequencing (scRNA-Seq) or bulk RNA sequencing (RNA-Seq) profile. MCE is a program that requires the normalized RNA-seq profile and a connected signaling interaction network between the genes defined in the profile to work. Moreover, this method can serve for inferring a gene regulatory network from the scRNA-Seq data itself.
ChamanzarEtAl2018
Allows users to identify cortical spreading depolarizations (CSDs) using electroencephalography (EEG) signals. This algorithm intends to detect different types of CSD waves, including narrow and complex patterns of CSD, using HD-EEG under specific conditions. Its analysis aims at being noninvasive and automated. This approach was tested on simulated electroencephalography (EEG) signals.
GT-TS / Good-Toulmin like estimator via Thompson Sampling
Provides an approach to experimental design for cell type discovery. GT-TS uses information across tissues to inform subsequent experiments in order to maximize cell type diversity and discovery. This method can be immediately applied to improve the effectiveness of experimental studies with alternative goals, such as designing sampling techniques for diversifying location-dependent tumor cell type heterogeneity.
MRH-SiNeC / Multi Reference Hill-climbing SIgnaling Network Constructor
Enables signaling network construction. MRH-SiNeC is a method for inferring topology of the signaling network using multiple reference networks, RNA interference (RNAi) data and the phylogenetic distances between these networks. It is based on the conjecture that the topological distances between the signaling networks of different species depend on their evolutionary distance. This method starts by applying SiNeC on each individual reference network and removes any inconsistency imposed by the RNAi constraints.
MR-SiNeC / Multi Reference Signaling Network Constructor
Enables signaling network construction. MR-SiNeC is a method for inferring topology of the signaling network using multiple reference networks, RNA interference (RNAi) data, and the phylogenetic distances between these networks. It is based on the conjecture that the topological distances between the signaling networks of different species depend on their evolutionary distance. This method solves the problem which reduces the running time by combining all the individual reference networks as the starting point.
SiNeC / Signaling Network Constructor
Furnishes a method for reconstructing the topology of a signaling network. SiNeC performs in three steps: (1) it estimates the approximate ordering of the critical genes in the reference network; (2) it removes edges that are in conflict with an order from the reference network; and (3) it inserts the missing edges that are necessary to ensure the flow between consecutive critical genes and the consistency of the remaining genes in the reference network.
S-SiNeC / Scalable Signaling Network Constructor
Enables large-scale signaling network reconstruction. S-SiNeC can construct networks involving hundreds of proteins with minimum sacrifice in optimality. This method has polynomial time complexity, but may fail to return a network that satisfies all the constraints enforced by the RNA interference (RNAi) data. It can be useful for biologists to construct novel signaling networks from in vivo or in vitro screening experiments.
RMODI / Regression MODelability Index
Forecasts results of the regression models for datasets of molecules. RMODI is an index able to consider nearest neighbors and the cardinality of the neighborhood to each molecule. This algorithm permits users to avoid unnecessary tasks or to depurate the molecule composition of a dataset of interest. It was tested on forty datasets gathered from different sources.
BaoEtAl2018
Predicts potential modified sites via a machine learning method with the features of amino acid residues. This algorithm uses an approach that carry out to delete redundant potential samples. It generates support vector machine (SVM) and multi-layer neural network models to predict the modified sites and non-modified sites based on the features selected. The SVM feature also allows identification of the post modification residues in the field of proteomics.
GeRe-ILP
Classifies oriented gene orders which includes three types of weighted rearrangement operations - transposition, inversion, and inverse transposition. GeRe-ILP consists of an integer linear programming (ILP) approach offering exact minimum-weight genome rearrangement scenarios for signed gene orders with arbitrary weights. This method is useful for different types of rearrangement operations.
DECtp / Differential Expression Caller by combining tumor purity information
Detects differentially expressed genes (DEGs) between tumor and normal samples. DECtp is a method leaning on the adjustment of tumor purity in differential expression (DE) calling. This approach generates a mixed Gaussian distribution by considering expression profiles of tumor sample. Then, the algorithm performs a generalized least square procedure to call differential expressions and a Wald test.
WangEtAl2018
Consists of a deep-learning algorithm, integrated with a multi-threaded processing system, for the automatic detection of polyps during colonoscopy. This algorithm can assist the assessment of differences in polyp and adenoma detection performance among endoscopists. It was validated using two image studies and two video studies.
deepMc / deep Matric completion
Offers an imputation technique for single cell RNAseq (scRNAseq) data. deepMC is an application that does not assume any distribution for gene expression, based on a combination of deep matrix factorization and deep dictionary learning methods. This program can be applied to large datasets and has been tested with datasets originated from four different studies.
NetGO
Aims to improve large-scale automated function prediction (AFP) with massive network information. NetGO is an AFP method addresses: (1) the label side of multilabel classification problem by using learning to rank (LTR) and (2) the instance (protein) side by incorporating network-based information. This approach enables the incorporation of network information at a large-scale level. It was validated by conducting comprehensive experiments on large-scale datasets under the critical assessment of functional annotation (CAFA) settings.
D-GPM / Deep-Gene Promoter Methylation Inference
Predicts whole genome promoter methylation level, based on the methylation profile of the landmark genes. D-GPM is a multi-layer deep neural network whose performance was benchmarked against linear regression (LR), regression tree (RT), as well as support vector machine (SVM) with regards to methylation profile data based on Illumina Human Methylation 450k from The Cancer Genome Atlas (TCGA).
miPrimer
Designs primer pairs with acceptable quantitative polymerase chain reaction (qPCR) efficiency for templates other than micro-RNA (miRNA). miPrimer is an empirical-based method developed by learning from several failed cases during miRNA primer design phases. Furthermore, it is able to distinguish members of the same miRNA family by increasing primer specificity while reducing the primer dimer issue.
Gene Ranker
Assists users with key gene identification in immune diseases. Gene Ranker is an in-silico method that initially constructs a backbone network based on protein interactions. It can predict key genes even when there are few known genes. It employs the semi-supervised learning for gene scoring. This method is disease-specific and consists of three steps: (1) network construction, (2) network selection and integration, and (3) key gene scoring.
AIDE / Annotation-assisted Isoform Discovery and abundance Estimation
Verifies false isoform discoveries by implementing the statistical model selection principle. AIDE employs a stepwise likelihood-based selection approach to find gene and exon boundaries from annotations and borrow information from the annotated isoform structures. This method can determine the abundance of the identified isoforms in the process of isoform reconstruction.
MiMSeg / Mixture Model based Segmentation
Allows automated detection of tumor tissue on nuclear magnetic resonance (NMR) apparent diffusion coefficient maps. MiMSeg is an algorithm that enables users to reveal tumour heterogeneity by identification of the clusters consisting of gaussian mixture models (GMM) components related to the homogenous areas within tumour tissue.
AdResS / Adaptive Resolution Simulation
Allows an on-the-fly interchange between the atomistic (AT) and grand canonical (CG) description (and vice versa) of the molecules according to their position in space. AdResS is a method that permits users to control basic thermodynamic and structural properties in the transition region.
BurdickEtAl2018
Serves for sepsis prediction and detection. This program consists of a machine learning algorithm (MLA) that can determine risk of sepsis using data from patient electronic health records. This tool is designed to improve patient outcomes in a variety of clinical settings. Moreover, it can determine severe sepsis using six frequently collected patient measurements.
APD / Advanced Peak Determination
Performs peak detection in survey scans (MS1) to increase the number of precursors selected for unimolecular dissociation (MS2). APD is a peak picking algorithm developed to increase the number of peptides identified in label-free peptide experiments. Its benefit comes from its ability to identify overlapping isotope distributions for MS2 acquisition. This algorithm should not be used in combination with MS2-based quantitative proteomic analyses employing isobaric mass tag labeling.
ConDock
Predicts physically plausible ligand binding sites by combining information from ligand docking and surface conservation. ConDock is an hybrid strategy that combines information from surface conservation with intermolecular interactions from docking calculations. This method was used to predict viable ligand binding sites for four different G-protein coupled estrogen receptor (GPER) ligands.
PREMONition / PREdicting Molecular Networks
Predicts molecular circadian clock associations using functional relationships. PREMONition is an algorithm based on the incorporation of proteins encoded by known clock genes (when available), rhythmically expressed clock-controlled genes and non-rhythmically expressed but interacting genes into a cohesive network. The software can be used to identify candidate clock-regulated processes and thus candidate clock genes in other organisms.
FuncSFA / Functional Sparse-Factor Analysis
Furnishes a continuous characterization and a functional interpretation of the variation across tumors at the molecular level. FuncSFA is composed of three parts designed to: (1) compute the sparse-factor analysis to obtain the factors, (2) interpret the obtained factors in terms of the possible biological processes they represent and (3) reveal the biological processes likely giving rise to the molecular profiles observed for that sample.
SMMN / Specific Modules in Multiple Networks
Discovers the condition-specific modules by considering multiple networks. SMMN is a heuristic algorithm that provides several insights: (i) characterizing condition-specific modules by taking into account multiple networks is effective to guarantee the specificity and modularity, (ii) the integrative analysis of multiple networks and (iii) the condition-specific modules capturing various features in topology and function, providing insights into the mechanisms of cancers.
FuzCav
Allows systematic comparison of protein-ligand binding sites. FuzCav is a generic cavity fingerprint that identifies similarities between ligand-free and ligand-bound active sites. This algorithm does not require a prior 3D structural alignment of proteins to compare and is applicable to any druggable cavity from any protein class. It was used in scenarios such as (1) screening a collection of binding sites for similarity to different queries, (2) classifying protein families by binding site diversity, and (3) discriminating adenine-binding cavities from decoys.
Qiu2018
Allows users to cluster cell based on the binarized data. This solution consists of a co-occurrence clustering algorithm that works with binarized single-cell RNA-seq count data. This tool can detect cell populations, as well as cell-type specific pathways beyond variable genes. Moreover, it processes in two steps: gene pathway identification and cell type discovery.
B-COSFIRE
Assists users to perform vessel segmentation in retinal fundus images. B-COSFIRE includes features for identifying patterns in videos. This method optimizes the suppressing mechanism for the filter input and output thresholds. Especially, three parameters are optimized: preprocessing threshold, post-processing threshold, and background artifact size are chosen for optimization.
FA-VNR / Fragment-Aware Virtual Network Reconfiguration
Allows fragment-aware virtual network reconfiguration. FA-VNR is a heuristic algorithm that selects (i) the set of virtual nodes to be migrated according to the fragment degrees of the physical nodes, and (ii) the best virtual node migration scheme according to the reduction of the fragment degrees of the physical nodes as well as the reduction of the embedding cost of the embedded virtual networks.
EP-DNN / Enhancer Prediction using Deep Neural Network
Allows users to determine enhancers based on chromatin features in different cell types. EP-DNN consists of a deep neural network-based global enhancer prediction algorithm. It enables researchers to detect enhancers in two distinct cell types, namely the human embryonic stem cell type (H1) and a differentiated primary lung fibroblast cell line (IMR90).
Snooker
Allows users to generate pharmacophore hypotheses for compounds binding to the extracellular side of the structurally conserved transmembrane (TM) domain. Snooker can be used for detecting receptor-specific ligands and ligand binding residues in cross-screen. Moreover, this tool is suitable for apo-proteins and can be applied to all receptors of the G-protein coupled receptors (GPCR) protein family.
EC-PSI / EC-Pfam Statistical Inferencing
Infers high confidence associations between enzyme commission (EC) numbers and Pfam domains. EC-PSI is designed to directly find associations from existing EC-chain associations from SIFTS and EC-sequence associations from SwissProt and TrEMBL. This algorithm collects and integrates a large number of existing EC-chain/sequence annotations, allowing it to deduce over 8000 direct EC-Pfam associations with respect to the manually curated InterPro database.
LPI-NRLMF / lncRNA-Protein Interactions prediction by Neighborhood Regularized Logistic Matrix Factorization
Predicts the potential long non-coding RNA (lncRNA-protein) associations. LPI-NRLMF is a matrix factorization computational approach for uncovering lncRNA-protein relationships. This method adopts a semi-supervised learning strategy, which deduces unknown data mainly by known interactions and their similarities, so negative samples are not needed. It was assessed by performing a cross validation of known experimental lncRNA-protein scores.
AbassEtAl2018
Detects the position of the human eye limbus in three dimensions and measures the full 360˚ visible iris boundary. This approach presents a non-parametric method for eye limbus detection and a dynamic method for measurement of the white-to-white distance along the eye horizontal line, which is used as a predictor of the limbus, sulcus, and effective intraocular lens position (ELP) in some important clinical applications.
BARTMAP / Biclustering ARTMAP
Performs biclustering on gene expression data, particularly for cancer classification discovery. BARTMAP is a biclustering algorithm adapted to and modified from a neural-based classifier, Fuzzy ARTMAP. It consists of two Fuzzy ART modules communicated through the inter-ART module. This method is able to detect atypical patterns during its learning. It can be used with types of data that have high dimensionalities.
SB-CWT / Continuous-Wavelet-Transform-based Sub-Band rPPG
Serves for the decomposition of the RGB signals. SB-CWT enables the usage of a weighting function based on the global energy distribution which serves as an additional filter of undesired signal components. This tool can be used for combining the individual sub-band pulse signals into a single output pulse signal. Furthermore, this algorithm was tested on the publicly available MMSE-HR dataset.
DHLP / Distributed Heterogeneous Label Propagation
Allows users to analyze label propagation in heterogeneous networks. DHLP consists of an algorithm permitting researchers to discover potential interactions of drug-target, drug-disease, and disease-target. This algorithm is available in two version, DHLP-1 and DHLP-2, both operable for determining drugs, targets, and diseases interactions. Moreover, the second version of this algorithm includes functionalities for specifying the label value of each vertex in the code.
ColonFlag
Identifies individuals at elevated risk of colorectal cancer for intensified screening. ColonFlag is a machine- learning-based algorithm for detection of the presence of high-risk adenomatous polyps at colonoscopy. It incorporates several patient factors such as age, sex, hemoglobin, red blood cell parameters, and white blood cell parameters. ColonFlag was able to detect individuals at elevated risk of having colorectal cancer or a high-risk precancerous polyp using data solely based on routinely collected complete blood cell counts and patient’s age and sex.
GRAM
Predicts the expression-modulating effect of a non-coding variant in a cell-specific manner. GRAM is a generalized model that incorporates selected transcription factor (TF) binding information from in vitro SELEX assays, representing the general binding potential of TFs on the variant’s location, and cell type-specific expression profiles, representing cellular contexts. It can be useful for elucidating the underlying patterns of variants that modulate expression in a cell-type context.
DSFPSO / Dynamic Scale-Free network Particle Swarm Optimization
Assists users with the extraction of features on multi-omics data. DSFPSO is a particle swarm optimization (PSO) with dynamic scale-free network. Four types of velocity updating strategies are used in this algorithm for fully considering the heterogeneity of particles and the connecting between neighbors. This method can be used to extract genes associated with cancers.
RUNES / Rapid Understanding of Nucleotide variant Effect Software
Allows users to study nucleotide variant effect. RUNES is a program that can perform different steps related to variant characterization, annotation, and conversion to human genome variation society (HGVS). This tool furnishes results in a report containing all of the information useful for variant interpretation, together with a cumulative variant allele frequency and a composite American college of medical genetics (ACMG) categorization of variant pathogenicity.
ABSR / Autoregressive Bayesian Spectral Regression
Determines the rhythmicity of a gene expression profile with short time series. ABSR is an algorithm that estimates the period of time-course experimental data and classifies gene expression profiles into multiple rhythmic categories simultaneously. It enables the improvement of true discovery rate (TDR) and the reduction of false discovery rate (FDR) for noisy short time series. It can be useful, for instance, for users who would like to maximize the discovery of rhythmic genes with 4-hour temporal resolution data.
MAPINS / MAPping INSertions
Detects insertion sites created by the integration of an APHVIII (aminoglycoside 3′-phosphotransferase VIII) cassette that confers paromomycin resistance. MAPINS can also identify flanking genomic DNA sequences around different insertion sites. It uses whole-genome sequencing data and retains all the sequencing information provided by paired-end sequencing.
Oncofinder
Allows both quantitative and qualitative analysis of the intracellular signaling pathway activation (SPA). Oncofinder is a biomathematical method that permits users to quantitatively estimate SPA for individual samples, basing on the large-scale gene expression data. This method permits the processing of transcriptomic data at high throughput, but can also be applied to proteomic data sets, where advances in proteomics make it possible to generate expression data sets at the proteomic scale.
SMART / Splitting Merging AwaReness Tactics
Assists users to cluster different types of genes. SMART is able to split and merge clusters automatically during the process. It does not require any a priori knowledge of either the number of clusters to work. In the clustering results, this algorithm integrates several tasks and clustering techniques, such as clustering validation. This method is composed of two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model.
iRSpot-DACC
Enables yeast hot/cold spots identification. iRSpot-DACC predicts hot/cold spots across the yeast genome. This method combines support vector machines (SVMs) and a feature called dinucleotide-based autocross covariance (DACC), which is able to incorporate the global sequence-order information and fifteen local DNA properties into the predictor. It can improve the predictive accuracy and reduce the computational cost.
GIGA / Gene tree Inference in the Genomic Age
Aims to construct trees in an agglomerative way from a distance matrix representation of sequences. GIGA is an algorithm that assists users with phylogenetic reconstruction of large gene families and determination of orthologs on a large scale. This method makes use of a conceptualization of gene trees as being composed of orthologous subtrees which are joined by other evolutionary events such as gene duplication or horizontal gene transfer.
kmlShape
Clusters trajectories according to their shapes. kmlShape is a longitudinal data partitioning algorithm based on a variation of the k-means algorithms in which a “shaperespecting distance” and a “shape-respecting mean” are used. This method enables grouping individuals whose trajectories have similar forms but shifted positions in time. With real datasets, it allows detection of groups of individuals of non-negligible sizes.
POPE / Post Optimization Posterior Evaluation
Performs post optimization posterior evaluation (POPE) of simulators. POPE computes and visualizes all simulations that can generate results of the same or better quality than the optimum, subject to constraints. This method differs from standard ABC in that standard ABC compares simulator outcomes with observations while POPE reasons about an optimization problem. This algorithm enables users to explore and understand the role that constraints, both on the input and the output, have on the optimization posterior.
SOTXTSTREAM
Improves functionalities of the SOSTREAM algorithm. SOTXTSTREAM intends to increase the efficiency of the previous algorithm for clustering streaming text. It aims to reduce the number of micro-cluster produced and mainly work with real-world disparate text stream with synthetic concept drift. Compared to SOSTREAM, it decouples the dependence of three cluster-maintenance phases (insertion, adjusting, and merging) on a single neighborhood size parameter.
FCRC / Filtering features, Clustering variables, Reducing the clusters and the dimension, and Clustering samples
Enables disease classification based on DNA methylation (DNAm) signatures. FCRC is an unsupervised learning method that consists of four steps: (1) preprocessing, (2) filtering features, (3) clustering variables and dimension reduction and (4) final sample clustering. This algorithm combines pre-filtration of the data to identify the most promising methylation sites, clustering to identify co-varying sites, and an iterative method to further refine the signatures to build a clustering framework. It was applied on several DNA methylation array datasets.
RNA-Specificity Metric
Quantifies RNA specificity of protein-RNA interface predictions. RNA-Specificity Metric offers metrics for assessing and comparing RNA partner specific protein-RNA interface prediction models. This algorithm could be adapted to assess the protein partner-specificity of the predicted protein-binding ribonucleotides (rNTs).
DI / Directionality Index
Aims to identify topological domains in the genome. DI is a statistic method to quantify the degree of upstream or downstream interaction bias for a genomic region. It intends to provide a reproducible approach that uses a Hidden Markov model (HMM) to identify biased “states” and therefore infer the locations of topological domains in the genome.
CHDF / Clustering based Hi-C Domain Finder
Assists users in identifying Hi-C domains. CHDF is a method based on the difference of interaction intensity inside/outside domains. It can define chromatin domains validated by higher resolution local chromatin structure data. It provides a chromatin structure which can be verified by other kinds of experiment datasets. It also provides chromatin structure at smaller size.
aCGH-MAS / aCGH MultiAgent System
Performs analysis of array comparative genomic hybridization (aCGH). aCGH-MAS is a multiagent system for managing the information of aCGH arrays, with the aim of providing an extensible system to analyze and interpret the results. It is composed of three layers: analysis, information management, and visualization. This system allows the reuse of functionalities for specific layers and the addition of agents specialized in specific case studies.
LuEtAl2015
Performs allelic RNA expression imbalance (AEI) analysis using RNA-seq data. This method consists of a framework to determine cases of AEI, and hence cis-acting regulatory factors, from RNA-seq data. It can be useful when scanning for AEI signals in RNA-seq datasets having a large number of genes with small number of heterozygous single nucleotide polymorphisms (SNPs) from multiple tissues.
TCRep 3D
Consists of an automated systematic approach for TCR-peptide-MHC class I structure prediction, based on homology and ab initio modeling. TCRep 3D is a method dedicated to the modeling of high-quality T-cell receptor major histocompatibility complex (TCRpMHC) complexes and that focuses on the complementary determining regions (CDR) loops structure.
ISFLA / Improved Shuffled Frog Leaping Algorithm
Consists of an extended shuffled frog leaping algorithm (SFLA) for high-dimensional biomedical data feature selection. ISFLA is an approach to solve the multimode resource constrained project scheduling problem. It was used to solve feature selection in high-dimensional biomedical data. This algorithm can serve as a pre-processing tool to help optimize the feature selection process of high-dimensional biomedical data, mine the function of biological datasets in fields of disease diagnosis, and improve the efficiency of disease diagnosis.
SOPRIM / Structure-guided Roadmap-based Protein Transition Modeling
Performs computation of transition paths. SOPRIM is a sampling-based algorithm that addresses the issue of insufficient sampling by leveraging experimentally-determined structures of a protein to restrict sampling in a space of a reasonable number of dimensions and on regions of relevance for transition events. It can compute lowest-cost paths between any given structures and allows investigation of hypotheses regarding the order of experimentally-known structures in a transition event.
DiscMLA / Discriminative Motif Learning via AUC
Discovers motifs on high-throughput datasets. DiscMLA is an approach for learning motifs on high-throughput ChIP-seq datasets. This algorithm is able to directly optimize area under the curve (AUC) during searching process, which improves the performance for screening candidate motifs. The algorithm can only identify the conserved sites for a considered transcription factor (TF). It can assist users in exploiting high-throughput datasets and answering fundamental biological questions.
LigandScout
Aims at highlighting pharmacophore information from known and unknown protein-ligand complexes. LigandScout is an algorithm that extracts, in an automated way, ligands including interpretation and identification of the relevant amino acids. This method intends to increase the efficiency of the interpretation of ligand topology by the generation of pharmacophore models.
RNA ISRAEU / RNA Identification of Structured Regions As Evolutionary Unchanged
Identifies RNA regions possessing evolutionarily conserved secondary structures. RNA ISRAEU allows rational design of oligonucleotide cocktails interfering with multiple computationally predicted structures, so no single or few mutations would result in a resistant viral strain. This tool assist in designing attenuated vaccines and/or drug designs based on disrupting conserved RNA structural elements.
LSHPlace
Performs phylogenetic placement. LSHplace is an algorithm that uses locality-sensitive hashing and includes inferred ancestral sequences in the hash tables, thus allowing users to approximately locate new sequences onto an existing phylogenetic tree. A local-search procedure enables the identification of an optimal placement for each new sequence. This algorithm can be useful in the context of metagenomic sampling, where millions of sequence reads are generated and analyzed at the same time to characterize environments.
Cox-MDR / Cox Multifactor Dimensionality Reduction method
Consists of an extension of the generalized multifactor dimensionality reduction (GMDR) to the survival phenotype. Cox-MDR is an algorithm that uses the martingale residual of the Cox regression model as a score to classify multi-loci genotype combinations into high and low-risk groups. It is able to adjust for covariate and can be extended to some types of high-dimensional data such as copy number variation (CNV) and next generation sequencing (NGS) data.
AMLGAM / Automated Machine Learning Guided Atom Mapping
Enables estimation of bond stabilities based on the chemical environment of each bond using machine learning techniques. AMLGAM is an automated optimization-based approach that finds the reaction mechanism which favors the breakage/formation of the less stable bonds. It was tested on a manually curated dataset of 382 chemical reactions and run on a large and diverse dataset of more than 7,400 chemical reactions.
MuSSeL / Multi-fingerprint Similarity Search aLgorithm
Identifies drug targets. MuSSel is a method that reports the targets which can interact with a molecule of interest and its corresponding measures of bioactivity in terms of Ki and IC50 values. This approach includes two main steps: (i) it selects drug targets biased by the query compound; ii) it predicts Ki or IC50 values towards each selected drug target. It can be used to pair novel compounds to putative drug targets or to repurpose known drugs to apparently unrelated diseases.
MAIA / Multidimensional Assessment of Interoceptive Awareness
Measures multiple dimensions of interoception by self-report. MAIA is an 8-scale state-trait questionnaire with 37 items that has been translated into 20 other languages and used in numerous studies worldwide. It consists of eight scales corresponding to its 8-factor structure: noticing, not-distracting, not-worrying, attention regulation, emotional awareness, self-regulation, body listening, and trust. It can be useful for interoception research and the evaluation of clinical mind-body interventions.
EBHiC / Empirical Bayes model for peak detection from HiC
Enables peak detection from Hi-C data. EBHiC identifies peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. This tool offers principled probability distribution estimates for Hi-C counts, and provides flexible modeling of over-dispersion by explicitly including the “true” interaction intensities as latent variables, without any restrictive parametric assumptions.
LCP / Locally Consistent Parsing
Aims to improve string processing. LCP starts from a set of patterns and proposes a two-steps approach that: (i) handles each label in the string LCP whose neighbors are not identical to it, then, (ii) handles substrings which consist of a single repeating label. It aims to then generate a data structure that supports insertion and deletion as well as being searched.
SSCMDA / Spy and Super Cluster strategy for MiRNA-Disease Association prediction
Predicts the potential miRNA-disease associations. SSCMDA is based on known miRNA-disease associations, integrated disease similarity and integrated miRNA similarity. This tool can adopt spy strategy to identify reliable negative samples from all the unknown miRNA-disease pairs, which contained mixed training samples including both potential associations and real negative samples.
GIMDA / Graphlet Interaction for MiRNA‐Disease Association prediction
Integrates the disease semantic similarity, miRNA functional similarity, Gaussian interaction profile kernel similarity and the experimentally confirmed miRNA‐disease associations in a prediction model. GIMDA described the complex relationship between two nodes based on graphlet interaction, in which both direct and indirect links between the nodes were considered. This method combined the association score of a miRNA‐disease pair calculated in the miRNA graph with the score calculated in the disease graph, which made it applicable to predict new diseases without any known related miRNAs or new miRNAs without any known related diseases.
CsreHMM / cell type-specific regulatory elements by Hidden Markov Model
Detects cell type-specific regulatory elements (CSREs). CsreHMM is an integrative and comparative method based on a hidden Markov model, that systematically reveals cell type-specific regulatory elements (CSREs) along the whole genome, and simultaneously recognizes the histone codes (mark combinations) charactering them. This method also reveals the subclasses of CSREs and labels those shared by a few cell types.