I-Boost / Integrative Boosting
Assists users for variable selection and result prediction. I-Boost is a method that combines elastic net with boosting. This model can be appropriate for users who aim to simultaneously consider multiple genomics and/or proteomics data types. It also permits to build upon the clinical variables and to extract additional useful information from genomic variables to improve the prediction.
EGSCyP / Exhaustive Grid Search for Cyclic Peptides
Assists user exhaustive exploration of the energy landscape of cyclic pentapeptides possibly involving chemical modifications. EGSCyP is is based on a robotic approach and a multi-level representation of the peptide. This approach can be exploited within stochastic exploration-optimization methods, such as variants of Basin Hopping (BH), and is able to provide a global picture of the conformational landscape.
EPAA / Evaluating Pairwise Alignment Algorithms
Assists users in evaluating and choosing a pairwise alignment algorithm. EPAA is an algorithm that is based on the construction of a distance matrix coupled to a k-means method for algorithms performances’ measurements. The method generates two plots: one including original family labels and one with the clusters. It can be used to evaluate alignment quality as well as for deducing evolutionary relationships from clusters.
Offers a method for pairwise protein structure alignment. EDAlign SSE is an algorithm that is based on the reconstruction of a detected structural alignment by using a rigid transformation following the determination of both correspondence and residues into secondary structures elements. This algorithm can be used with protein of unequal lengths.
Proposes an approach for pairwise protein structure aligning. EDAlign RES starts from two equals proteins to determine the most structural alignment. The method is based on the refining of correspondence which is first derived from eigendecomposition.
PCSD / Participation degree of a protein in protein Complexes and Subgraph Density
Serves to rank all proteins in refined protein-protein interaction (PPI) networks (RPINs) according to the computed scores. PCSD is based on an essential proteins prediction method that includes participation degree in protein complexes and subgraph density. This tool proceeds by building a refined PPI network and computes the participation degree in complexes for each protein based on the weighted RPINs produced by Edge Clustering Coefficient (ECC) and Pearson Correlation Coefficient (PCC).
LIDC / Local Interaction Density combined with protein Complexes
Provides an identification method for essential proteins in a protein interaction network. LIDC predicts local interaction density and contains three components: (1) a centrality measure based on local topological properties of proteins in protein complexes named local interaction density (LID); (2) the biological information of protein complexes called in-degree centrality of complex (IDC) and (3) an integration strategy that associates LID and IDC.
WDC / Weighted Degree Centrality
Predicts essential proteins from global protein-protein interaction (PPI) and gene expression data. WDC includes gene expression profiles into the PPI network in terms of Pearson correlation coefficient to bridge the gap between PPI and gene expression data. This program maximizes the essentiality of the essential protein and testifies the inherent significant modularity.
Serves for the classification of 16S rRNA metagenomic profiles of bacterial abundance. This algorithm is based on a Bayesian method and introduces a Poisson-Dirichlet-Multinomial hierarchical model for: constructing a prior distribution from sample data, deriving an optimal Bayesian classifier (OBC), or computing the posterior distribution in closed form. Moreover, this method can be applied in different situation such as in the absence of phylogenetic information, the presence of multiple classes or with small ratios of sample size to dimensionality.
Assists users to detect multi-color fluorescently labeled axons in dense electron microscopy (EM) data. FluoEM consists of a set of experimental and computational tools for the virtual labeling of multiple axonal projection sources in connectomic 3D EM data of mammalian nervous tissue. Moreover, this method can be used for the identification of as many axonal projection sources in a single connectomic experiment as can be encoded at the lightmicroscopic level.
Aims to upgrade nominations for epigenetic driver genes by leveraging quantitative high-throughput proteomic data. ProteoMix includes a methodology permitting to select genes where DNA methylation is predictive of protein abundance. It is able to settle key molecular and high-level disease features. Moreover, this tool can be applied for studying different types of cancer with both transcriptomic and proteomic data such as breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COADREAD), and ovarian serous cystadenocarcinoma (OV).
Deduces modules, transcription factor-binding site (TFBSs), and motif patterns based on their joint posterior distribution. CisModule consists of a hierarchical mixture (HMx) model including a Bayesian approach. It catches the spatial correlation between different binding sites for a set of transcription factors (TFs). This tool employs the colocalization of TFBSs to improve de novo motif identification.
Identifies and segments the glomeruli present within digitized images of human kidney biopsies. This method is a framework that can perform image classification. It is built on a deep learning architecture based on convolutional neural networks (CNN). This model can be utilized in the form of a software tool at the point-of-care to assist nephropathologists. It can also be adapted to other images obtained via different histological staining protocols.
CVAE / Convolutional Variational Auto-Encoder
Reduces high dimensionality of protein folding trajectories. CVAE is an algorithm that automatically clusters conformations from molecular dynamics (MD) simulations into a small number of conformational states that share similar structural and energetic characteristics. This method can be used to potentially augment propagators in time. It also reveals folding intermediates of Fs-peptide.
Permits users to reveal composite clusters of cis-elements in promoters of eukaryotic genes. ClusterScan is an algorithm that enables three functionalities: (1) a training system on representative samples of promoters to reveal cis-elements that tend to cluster; (2) a training system on a number of samples of functionally related promoters to identify functionally coupled transcription factors; and (3) tools for searching clusters in genomic sequences to identify and functionally characterize regulatory regions in genome.
Assists users in finding causal genetic biomarkers of adverse drug reactions. HUME is a model for discovering relations between specific Drug-Reactions and the Genetic biomarkers (G-DRs). This multi-phase algorithm is composed of a network scoring system to select candidate relations between the biomarkers and the drug reactions, and a quasi-experimental design test for evaluating causal significance of the candidate relations.
GWASH / GWAS heritability
Allows estimation of heritability for genome-wide association studies (GWAS).
Prioritizes putative neoantigens for experimental investigation in immunotherapy. ForestMHC consists of a random-forest approach that employs a combination of hydropathy, presence of aromatic rings, sparse encoding, and mass (HASM). It relies on wild-type peptides rather than neoantigens for training data. This tool also utilizes artificial neural networks (ANN) and needs the potential peptides derived from private mutations to work.
ARSim / Antibiotic Resistance Simulation
Conducts simulations of models of bacteria, antibiotics, enzymes, and their interactions. ARSim leans on four models that constitute the processes of antibiotic resistance, bacteria-antibiotic interactions, enzymes, and the environment. It integrates horizontal and vertical transfer mechanisms of antibiotic resistance genes. This tool is useful for the discovery or the development of new antibiotics.
Exploits actimetry data to investigate sleep and wake traits. PennZzz is an algorithm considering a combination of temperature and movements’ patterns to highlight various phases that can be detected during sleep. It aims to be used as an alternative to polysomnography (PSG) processes and can be used to determine: (i) sleep within the active phase; (ii) activity during the sleep phase (iii) and behavioral states such as deep sleep.
Offers an approach that assist users in calculating the history-dependent mechanical damage of axonal fiber tracts in the brain. This algorithm is able to track cumulative damage and to degrade the mechanical response of the material. Additionally, it can be used to investigate fiber damage, and then assess both magnitude and frequency of successive fiber tract strains.
Reports the time evolution of a network based on molecular processes. This algorithm provides a framework allowing users to generate a unified statistical description of the population behavior by linking molecular properties (including bond lifetimes) and global wall mechanics such as growth or stress relaxation. It can be used to depict the dynamics of the tethered network of microfibrils in the cell wall during expansive growth.
Facilitates streamlining assessment of sodium magnetic resonance imaging (NaMRI) measurements in the leg in both research and clinical settings. This algorithm is built on a straightforward approach that figures out sodium levels based on the sodium magnetic resonance images of the calf. The method developed is an application-specific automated segmentation pipeline for lower leg whereas users can also manually segment the regions of interest.
Determines presence/absence of breakpoints into single nucleotide polymorphisms (SNP) array data. DeepSNP proposes a deep neural network, trained with stochastic gradient descent (SGD) algorithm, composed of five different modules dedicated to an optimized learning. The application can: (i) handle long stretches of copy number data; (ii) learn long genomic distance relations (iii) pinpoints breakpoints without the use of accurate labels for training.
Distributes submitted proteins between thermostable or mesostable classes. The algorithm considers three-dimensional structure to perform its categorization and does not require the use of additional information. This application focuses on the understanding of the specific structural arrangement of thermostable proteins energy interactions with the aim of assisting users in investigating their thermal properties.
Aims to diminish uninformative calls for non-invasive prenatal testing (NIPT). This algorithm investigates about the influence of fetal cell-free DNA (cfDNA) fragments' length in the determination of the z-score results. This approach intends to complement existing methodologies and, additionally can also be employed to decrease the proportion of false positive/negative samples by its ability to pinpoint maternal aberrations.
ToxPi / Toxicological Priority Index
Allows users to profile and prioritize chemicals that integrates data from diverse sources. ToxPi consists of an algorithm that aims to investigate the effects of missing data and recommend solutions. This method was tested using simulated data motivated by high-throughput screening (HTS) data generated on chemicals in the substance priority list (SPL).
Selects relevant features from a large set of known features instead of combining them using linear classifiers or ignoring their individual coding potential. This algorithm utilizes machine learning techniques to predict genes in metagenomic samples. This method also rests on a feature selection technique minimum redundancy maximum relevance (mRMR) instead of combining features from single source into a new feature.
MGOGP / Module and Gene Ontology-based Gene Prioritization
Serves for cancer-related gene prioritization. MGOGP is an algorithm that can be used for genome-wide breast cancer gene ranking. This tool classifies genes considering information of both individual genes and their affiliated modules. It classifies modules considering three aspects of information: module-specific gene importance, differential correlations, and importance of the module itself.
SCuPhr / Single Cell Urn PHased Read
Permits to estimate cell distance matrix. SCuPhr provides a site pair model to search for mutations. This model captures various sources of noise associated with sequencing of single cells. It was implemented in an inference algorithm based on dynamic programming. This model contains variables associated with pairs of loci, of which one is homozygous and the other heterozygous, and has the capacity to perform Bayesian probabilistic read phasing.
POMOC (Partially Overlapping MOtif Counting) / Partially Overlapping MOtif Counting
Counts novel motifs by employing capacity levels for all interactions. POMOC is based on a method that exposes topological differences of biological networks under different genetic backgrounds and experimental conditions. This algorithm proceeds by computing the number of partially overlapping instances of a given motif in a given network and can extend to large-scale biological networks in practical time.
Computes phase probability distributions for SAD data. SOLVE/RESOLVE can be used for the recognition of NCS, density modification, and automated model-building. It determines the occupancies and positions of the anomalously-scattering atoms. This tool employs a library of side chain templates to match side chain density in a map with side chain types and rotamers in a probabilistic fashion.
Mimics the trajectory of the epidemic according to a specific compartmental model based on Gillespie’s stochastic simulation algorithms. This tool employs variants for simulating sampled transmission tree from an epidemiological trajectory using the coalescent approach. It was used to feign epidemics using either a simple susceptible-infected-recovered (SIR) model or a detailed model describing HIV spread in a heterogeneous population.
SHIN+GO / Self-organizing map Harboring Informative Nodes with Gene Ontology
Creates dynamic genome-wide integrative omics models with two time points. SHIN+GO utilizes an unsupervized machine learning to build models. It can be used to estimate the frequency of gene functional annotations present in nodes made of clustered co-regulated genes with corresponding co-secreted proteins. This tool is useful for comparative transcriptomics of different strains or species.
Enables hierarchical multi-step assembly of large DNA fragments. MetClo uses methylation-switching to deliver a type IIS restriction enzyme-based one-pot DNA assembly system. It enables the use of a single type IIS restriction enzyme throughout a hierarchical assembly process. This system can be used for hierarchical multistep assembly of a large DNA construct using a single type IIS restriction enzyme.
Mimics the time evolution of well-stirred chemically reacting systems. S-leaping improves the stochastic simulation algorithm (SSA). It can determine the total number of firings within a preselected time interval as a sample from Poisson distribution. This approach can be used to diminish the appearance of negative species and be extended to an implicit version of the algorithm.
Allows users to forecast footprints with a specific focus on ATAC-seq protocol. HINT-ATAC is an approach based on a combined browsing of strand specific cleavage bias and nucleosome number decomposition. This application is able to learn complex sequence cleavage preferences of the transposase enzyme by exploiting a probabilistic framework built around Variable-order Markov models.
HFSP / Homology-derived Functional Similarity of Proteins
Infers functional similarity of proteins on the basis of their alignment length and sequence identity. HFSP is a method that was developed to enrich functional annotation analysis on a large scale, and to narrow down the space of proteins of interest for further experimental analysis. This method assists users in improving the quality of existing and newly assigned functional annotations.
Assists users in the automatization of species identification. This algorithm is a machine learning approach that aims to reduce the manual work required for analyzing high-throughput collagen peptide mass fingerprints (PMFs) data of ancient bone samples. This method was able to reach taxonomic resolution at family/sub-family levels within the vertebrata.
Calculates area compressibility moduli of lipid bilayers and their individual leaflets. This algorithm yields elastic moduli that are in agreement with available experimental data for both single and multi-component bilayers composed of saturated, unsaturated lipids and cholesterol and simulated at different temperatures. This method analyzes the area compressibility of bilayers under tension.
SAM / Self-Assembling Manifolds algorithm
Permits users to rescale gene expression to amplify differences between cells. SAM is an unsupervised method that prioritizes genes with variable expressions across neighborhoods of cells. It improves cell clustering and marker gene identification. This method can uncover novel biology in a challenging dataset with only fine differences between cells.
Assists users in realizing classification on the genomics data. This algorithm is a convolutional neural network that was developed to discover potential biomarkers for each tumor type. It works in several steps: it (i) preprocesses the input data, (ii) makes tumor type classification using a convolutional neural network, (iii) generates the heat map for each class and picks the genes corresponding to top intensities in the heat-maps and (iv) validates the pathways of selected genes.
TGCN / Temporal Gene Coexpression Network
Consists of a “low-rank plus sparse” framework for building time-point specific gene co-expression networks (GCNs) from time-course gene expression data. TGCN is a network model that jointly models the temporal transcriptomic data when the samples at different time points are from distinct subjects. This algorithm generates time-specific gene-gene correlation matrices which could serve as input for gene co-expression network analysis procedures, such as weighted correlation network analysis (WGCNA).
HGLDA / HyperGeometric distribution for LncRNA-Disease Association
Predicts potential long non-coding RNA (lncRNA)-disease associations by integrating known microRNA (miRNA)-disease associations and lncRNA-miRNA interactions. HGLDA obtained a reliable area under ROC curve (AUC) of 0.7621 in the leave-one-out cross validation (LOOCV), based on known experimentally verified lncRNA-disease associations from the LncRNADisease database. It was also applied to predict: breast cancer, lung cancer, and colorectal cancer-related lncRNAs.
Consists of a long non-coding RNA global function predictor. lnc-GFP is a method in which a bi-colored biological network is constructed using coding–non-coding co-expression data and protein interaction data. The algorithm also predicts the proper functions for many long non-coding RNAs (lncRNAs) dynamically expressed in different stages of oliogodendrocyte and neuronal differentiation in their study.
Predicts long non-coding RNAs (lncRNAs) functions. KATZLGO is a global network-based approach that predicts unknown lncRNA-protein associations by measuring the similarities between lncRNAs of interest and proteins in an heterogeneous network. It is based on the KATZ measure which takes lengths of all paths between pair of nodes into consideration. The addition of protein interaction data can benefit the method.
GrwLDA / Global network Random Walk model for predicting LncRNA-Disease Associations
Predicts potential long non-coding RNAs (lncRNA)-disease associations on a large scale. GrwLDA is a global network random walk for potential human lncRNA-disease association prediction. This method integrates disease semantic similarities, lncRNA functional similarities, and known lncRNA-disease associations to discover the potential associations. It does not require negative samples. This algorithm can be applied to predict isolated disease (i.e., disease without any known related lncRNA), related lncRNAs, and novel lncRNA- associated diseases (i.e., lncRNA without any known associated disease).
Detects disease- long non-coding RNAs (lncRNA) associations. DisLncPri is a disease associated lncRNA prioritization method that integrates both competing endogenous RNA (ceRNA) theory and functional genomics data. This algorithm can assist in improving the understanding of lncRNAs regulation at the transcriptional level and results in novel biomarker discovery and therapeutic development of disease.
RWRHLD / Random Walking with Restart on the Heterogeneous LncRNA and Disease network
Prioritizes potential candidate disease-related long non-coding RNA (lncRNA) by integrating an lncRNA crosstalk network, a disease similarity network and an lncRNA– disease association network. RWRHLD is a rank-based method that can predict novel disease-related lncRNA. The algorithm relies on the topological structure of the heterogeneous network. It was assessed by performing case studies of two cancer types (ovarian cancer and prostate cancer).
KRWRH / Kernel based Random Walk with Restart in Heterogeneous network
Infers disease-long intergenic non-coding RNAs (lincRNA) associations. KRWRH uses the concept of Gaussian interaction profile kernel for diseases and lincRNAs. It predicts potential disease-lincRNA association by simulating random walk with restart method from its current node, and moving randomly to its neighbor in the heterogeneous network, starting from a given set of known disease and lincRNA seed nodes.
LncNetP / LncRNA Network-based Prioritization approach
Identifies disease-related long non-coding RNAs (lncRNAs). LncNetP is a systematical lncRNA prioritization approach. This method (i) detects significant lncRNA-lncRNA interactions according to microRNAs (miRNAs) with competing endogenous RNA (ceRNA) relations, (ii) constructs cancer-specific lncRNA networks associated with different disease phenotypes, and (iii) prioritizes candidate disease lncRNA by integrating disease phenotype associations.
GeTMM / Gene length corrected Trimmed Mean of M-values
Allows users to perform both inter- and intra-sample comparison with the same normalized data set. GeTMM is a normalization method that combines gene-length correction with the normalization procedure Trimmed Mean of M-values (TMM), as implemented in edgeR. The algorithm calculates the reads per kilobase (RPK) for each gene in a sample and it substitutes the total read count (RC) to the total RPK. It generates a normalized data set directly suited for multiple endpoints.
R2D2 / Reconstructing RNA Dynamics from Data
Aims to uncover details of co-transcriptional folding pathways by predicting RNA secondary and tertiary structures from co-transcriptional SHAPE-Seq data. R2D2 is an algorithm that uses nucleotide-resolution chemical probing data as inputs to reconstruct models of secondary and tertiary RNA co-transcriptional folding pathways. This sample-and-select method consists of two steps: generate a set of possible structures at each intermediate nascent RNA length by sampling candidate structures from the sequences alone; and computationally select the most likely structure using observed experimental data.
Assists clinicians to select an optimal anti-tumor necrosis factor (TNF) alpha biological therapy for Rheumatoid Arthritis Patients. RABIOPRED can predict non-response to TNF alpha blockers. The algorithm is validated in a multi-centric proof-of-performance clinical study.
MI-IPA / Mutual Information-based Algorithm
Estimates interaction partners among the paralogous proteins belonging to two interacting families just from their sequences. MI-IPA approximately maximizes mutual information between the sequences from the two protein families. This approach does not require any training set of initial known pairs of interacting partners.
Detects themes of interest in a set of documents from scientific literature. This algorithm leans on a projection algorithm, extended to highlight a recurrent theme, based on the analysis of a set of documents against a set of key terms intending to depict their contents. This method aims to be applied to a wide range of studies and can be used to build query platforms dealing with specific terms.
Permits users to understand protein dynamics in crystals. vGNM is a Gaussian network model that takes into account (i) the contribution of rigid body translation and rotation and (ii) the effect of crystal packing, by allowing the amplitude coeffcients of each mode to be variables. It hypothesizes that the effect of crystal packing should cause some modes to be amplified, and others to become less feasible.
Permits users to compute the periodicities of exons and introns in eukaryotic genomes based on Ramanujan Fourier transform (RFT). This algorithm is a combination of Voss representation and RFT leads to a better performance in periodicity detection. This method shows that codon sizes discussed in the hypothesis of the origin of genetic codons are similar to the significant periodicities of exons and introns through RFT.
SAVnet / Splicing-Associated Variant detection by NETwork modeling
Identifies splicing-associated variants (SAVs). SAVnet is an approach for detecting SAVs based on a list of somatic variants in a cohort and its matched RNA sequencing (RNA-seq) data using a rigorous statistical framework. It was used to perform a comprehensive analysis of a large number of primary cancer samples across 31 cancer types from The Cancer Genome Atlas (TCGA).
CID / Chromatin Interaction Discovery
Allows users to detect chromatin interactions from chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) data. CID provides a method which clusters proximal paired-end tags (PETs) into interactions leaning on a density-based clustering technique. This application aims to be more flexible than existing methods by using an approach based on anchors resolving. Besides, it can also be employed for the identification of chromatin interactions from HiChIP data.
Performs protein function prediction. ProLanGO is an application based on two languages: ProLan which converts sequences using protein “word” produced from UniProtKB database; and GOLan to convert proteins’ function based on Alphabet ID. This program converts the protein sequence and protein function into targeted languages and then constructs the neural machine translation (NMT) model to perform its prediction.
SINAPS / Simple Non-Bayesian Attribute Prediction Software
Allows users to determine microbial traits from marker gene sequences. SINAPS utilizes a word-counting algorithm that does not require alignments or trees. The skills of this tool was tested with query sequences for traits including energy metabolism, Gram-positive staining, presence of a flagellum, V4 primer mismatches, and 16S copy number.
Detects DNA methylation states from Nanopore sequencing reads. DeepSignal is a deep learning method that catches the sequence information around the methylated site. It uses bidirectional recurrent neural network (BRNN) to construct features from sequences of signal information. It is also able to detect 5mC and 6mA methylation states of genome sites.
SymNMF / Symmetric Nonnegative Matrix Factorization
Consists of a graph clustering method. SymNMF is a general formulation for clustering, based on non-negative matrix factorization (NMF). The method takes a nonnegative similarity matrix as input, and computes a symmetric nonnegative lower rank approximation. It is suitable for clustering data points embedded in linear and nonlinear manifolds. Two algorithms were developed for SymNMF: a Newton-like algorithm and an alternating non-negative least squares (ANLS)-based algorithm.
Magan / Manifold-Aligning Generative Adversarial Networks
Aligns two manifolds such that related points in each measurement space are aligned together. MAGAN is a generative adversarial network (GAN) that discovers relationships between domains by aligning their manifolds rather than just superimposing them. The algorithm can be used when one system is measured in two different ways and thus forms two different manifolds. It facilitates the integration of datasets from multiple biological modalities.
HDMKPRF / High-Dimensional McDonald-Kreitman Poisson random field method
Evaluates polymorphism and divergence sites in genomic sequences of multiple species to pinpoint the genes under positive of negative selection. HDMKPRF can detect the occurrence time of selection to a specific lineage of the species phylogeny. This approach exploits population genetics models that use Bayesian Poisson random field framework and fuses information over all gene loci to enhance the detect selection feature.
NBP-Seq / Negative Binomial Process
Performs differential expression analysis of high-throughput sequencing count data. NBP-Seq exploits a Bayesian nonparametric framework and eliminates sophisticated ad-hoc pre-processing steps that are usually required in existing algorithms. This algorithm considers different sequencing depths by using sample-specific negative binomial probability parameters to identify differentially expressed genes.
GNBP-Seq / Gamma-Negative Binomial Process for RNA-Seq
Models row heterogeneity by using sample specific negative binomial (NB) probability parameters. GNBP-Seq works with gamma-NB instead of NB distributions to model the counts at previously expressed genes brought by a new sample and exploits also a logarithmic mixed sum-logarithmic instead of logarithmic distributions to simulate the counts of expressed genes brought by a new sample.
BNBP-Seq / Beta-Negative Binomial Process for RNA-Seq
Serves for the modeling of RNA-seq samples. BNBP-Seq utilizes the negative binomial (NB) shape and probability parameters to palliate the Poisson rates of the scaled negative binomial process (NBP) and the normalized Poisson rates of the NBP to capture the variations of the gene counts across samples.
GRAND-SLAM / Globally refined analysis of newly transcribed RNA and decay rates using SLAM-seq
Assesses the proportion of old and new RNA. GRAND-SLAM permits users to deduce the proportion and corresponding posterior distribution of new and old RNA for each gene in a single SLAM-seq experiment. This software provides five main features: (1) it doesn’t require control experiment; a single labeling experiment is enough to estimate RNA half-lives; (3) it exploits posterior distributions to estimate half-lives; (4) assures an internal quality control for gene and experiment and (5) it allows to investigate fast regulatory processes.
Furnishes a machine learning method that combines systems biology and genetic models into a single score that arranges the strength of evidence for a gene’s involvement in autism spectrum disorder (ASD). forecASD generates a genome-wide score that can be a useful prior, filter or positive control in molecular studies of autism. This algorithm suits to examine under-appreciated aspects of the molecular etiology of autism.
TWL / two-way latent structure model
Consists of a Bayesian, unsupervised, integrative clustering model for clustering across data sources. TWL is an integrative clustering model based on two sets of cluster assignment variables of each sample in each dataset: the first set follows a priori a multinomial distribution, for each dataset independently, and the second set of cluster assignment variables is such that each such variable has the same (sample dependent) multinomial probability in each datasets. Its scalability and flexibility can aid in giving insight into underlying biology.
Ascertains values of the first eigenvector of a Hi-C matrix. This algorithm exploits both the GC content of the sequence and a single whole genome bisulfite sequencing (WGBS) experiment to perform its predictions. It is also able to determine the positions of the A and B compartments. This model leans only on methylation and sequence information to delineate an efficient approximation.
Classifies splice sites from raw DNA sequence. S-ResNet is built on a shallow version of ResNet and offers both advantage of shallow architecture and shortcut connection. This method uses a shortcut connection at each convolution layer that is different than the ResNet approach where the shortcut placed after a block consists of two convolution layers.
Checks the effect of taxa for reconstructing phylogenetic tree with accuracy. This approach eliminates a user-defined number of species within a single phylum from a concatenated alignment of orthologous genes. It is based on standard phylogenetic reconstruction methodologies such as orthology determination, alignment and tree building and utilizes different sampling scenarios to change the taxon sampling within a desired phylum.
SN-NeRF / Self-Normalizing Natural Extension Reference Frame
Permits to fold protein atoms. SN-NeRF is a self-normalized method that can calculate Cartesian coordinates from torsion space parameters. It generates its three orthonormal vectors prior to self-normalizing.
Offers an automated time structure learning model to automatically reveal the longitudinal genotype-phenotype interactions. This approach uses learned structures to improve the predictions of associations between genetic variations and longitudinal imaging phenotypes. This algorithm can simultaneously uncover interrelation structures existing in different prediction tasks and can be applied on both synthetic and real benchmark data.
MODIFI / Model Of Differential Interactions
Allows users to analyze how pathways and genetic interactions rewire over time. MODIFI identifies and characterizes differential interactions. This two-factor linear model can estimate the predictive strength and influence of time and differential compound treatment on pi-score. This estimation can be described as the slope by which an interaction changes (strengthens or weakens) over time.
Consists of an automated and unbiased methodology for whole-brain investigation of the complex mesoscale laminar architecture of the cortex. This sphere-based approach implements a geometric solution based on cortical volume sampling using a system of virtual spheres dispersed throughout the entire cortex. This method can enable the expansion of studies on the role of cortical thickness in brain function and behavior to the cortical layer level.
RP / Reciprocal Perspective
Estimates a localized threshold on a per-protein basis using several rank order metrics. RP is a modeling framework consisting of data-driven approach to leverage the context provided by jointly considering facets of the pair-wise protein-protein interaction (PPI) relationships. This method was developed in such a way that it is amenable to any weighted complete graph problem. It can be used to determine a new assessment of the interaction. RP can be applied to fields other than PPI prediction.
Discovers allele-specific regions in functional genomic datasets. AlleleHMM identifies allele-specific blocks of signal in distributed functional genomic data if contiguous genomic regions share correlated allele-specific events. It leans then on the Viterbi algorithm to detect the most likely hidden states through the data to obtain a series of candidate blocks of signal with allelic bias. This tool supports identification in both coding and non-coding genomic regions.
Identifies genotypic loci and covariates with effects on phenotypic variance. This method leans on a Bayesian test for heteroskedasticity (BTH) model ables to integrate discrete and continuous covariates. It also incorporates uncertainty in estimates of mean and variance effects of covariates to evaluate for variance quantitative trait loci (QTLs) and quantitative trait covariates (QTCs).
IMO / Ion Motion Optimization
Performs optimization by imitating the attraction and repulsion of anions and cations. IMO is a population-based algorithm inspired from properties of ions in nature. This algorithm divides the population of candidate solutions into two sets of negative charged ions and positive charged ions, and improves them according to the important characteristics of the ions: “ions with the same charges repel each other, but with opposite charges attract each other”. It also mimics liquid state and solid state to perform diversification and intensification.
Allows users to predict protein folding reliably at high resolution. IMOG consists of a method that utilizes ions motion optimization (IMO) algorithm for performing its analysis. For instance, this algorithm can be used for studying a variety of amino acids sequence data sets. It can also be applied for determining protein folding structure.
Provides a method that evaluates saturated DNA mixtures to identify saturated mixture contributors. TranslucentID proceeds by determining a subset of individuals who contributed DNA to saturated mixtures by performing mixture desaturation. This approach aims to underline the utility of mixture analysis of forensic samples with mixture single nucleotide polymorphism (SNP) panels.
Identifies high-risk individuals for cancers based on their germline genomic information. eTumorRisk is a network-based algorithm that contains (1) one component to build network models for discriminating a cancer sample from non-cancer samples, and (2) one component to determine which cancer type has the highest chance for the sample. It can (i) discriminate a cancer type from non-cancer samples, and between cancer types, (ii) filter out noise, (iii) identify multiple representative networks, and (iv) control false-positives.
BEAPR / Binding Estimation of Allele-specific Protein-RNA interaction
Offers a method to estimate allele-specific protein-RNA interaction. BEAPR consists of an algorithm that serves for the allele-specific binding (ASB) detection and prediction of functional genetic variants (GVs) in post-transcriptional gene regulation. Moreover, it employs an empirical Gaussian distribution to model the normalized read counts. The expected variance is estimated using a regression mode.
MAGE / Multiscale Adaptive Gabor Expansion
Aims to identify transient oscillatory burst amplitude and phase. MAGE is an algorithm that performs parameter reassignment to simplify discovery of a sparse decomposition by using a dictionary of parametric time-frequency-scale Gabor atoms.
Offers an approach dedicated to polygenic forecasting. LDpred-funct is an algorithm based on the leveraging of trait-specific functional enrichments and that performs an additional regularization step to account for sparsity. This method can be used to perform simulations with real genotypes. It was experimentally applied to UK Biobank and 23andMe cohorts to predict height through a meta-analysis.
Provides a logic-based framework to reconstruct signaling networks by using phosphoproteomic data and prior knowledge about their connectivity. This algorithm allows cells to be interrogated in the presence or not of drugs or small molecules that inhibit specific interaction. It aims to permit researchers to design complex experiments and dependencies across networks.
Enables network discovery from perturbed expression data. This approach consists of a framework for network inference that relies on temporal gene expression data coupled to genetic or chemical perturbation. It is suitable for the processing of expression measurements from high-resolution time series experiments involving precise genetic or chemical perturbation of a steady state system.
Enables the study of the effects of amyloid beta peptide (Aβ) on intracellular Ca2+. This mathematical model for intracellular Ca2+ regulation consists of a theoretical approach that allows the understanding of the driving mechanisms for various Ca2+ oscillatory patterns within an Alzheimer’s disease (AD) environment. It can be used to understand the impact of Aβ on Ca2+ fluxes through individual regulatory components (such as IP3, RyR, and plasma membrane).
Focuses on solving the target control problem. This method is based on an extension of an algorithm that searches the minimal solution of the structural target controllability problem, modified with additional heuristics intending to improve its efficiency. It can be applied to real-life-size networks and assists users in designing several therapeutic strategies using currently known drugs.
Provides a probabilistic framework for glioma detection and segmentation. This method leans on structure learning of undirected graphical models. It can perform structure learning and achieve glioma segmentation. It first over segments (MRI) images into superpixel regions to minimize computational cost and each superpixel serves for building undirected graph models. The main goal of this approach is to improve the accuracy of glioma segmentation.
Identifies an approximately minimum set of driver nodes to control a specified target set of nodes. This approach consists of a greedy algorithm (GA) based on the structural control theory: the system parameters are either fixed at zero or are independent free parameters. This algorithm can find the driver nodes for target control when the network structure is completely known.
Performs comprehensive in silico analyses. LeTE-fusion gives an ideal estimation of peptide and variant peptide detections. It derives a realistic estimation of the percentage of detectable genome-annotated variants in shotgun mass spectrometry (MS) experiments using peptides with experimental evidence. This tool is useful for the assessment of feasibility of detecting other types of peptides or variations.
Serves for visual knowledge exploration in molecular interaction networks. This algorithm combines distance functions to cluster the contents of a complex visual repository on human disease, and to discover different cluster sets. Furthermore, it assists users in exploring complex biomedical repositories, and in annotating high-level areas of such maps.
Maps the reads to a reference using dynamic time warping. This approach can update the used reference with insertions and deletions. It localizes, aligns and corrects this sequence with indels to simplify the subsequent read alignments. This method was created to compute unsupervised clustering of bioacoustic sequences. It can be employed with other techniques to investigate large genomic sequences.
Field Sensor
Assigns a field to each token or sequence of tokens in a query. Field Sensor processes by calculating a mapping between a query segment and a field, along with the likelihood of that mapping. This tool labels each segment of a query with a PubMed record field: text, title, author, journal, volume, issue, page and date.
RPCA / Relative Principal Components Analysis
Serves for analyzing the energetically relevant conformational changes of a biomolecule upon binding to various ligands. It can compute collective canonical variables such as linear combinations of the original features. It contains features for recognizing the conformational changes, which are relevant to the macroscopic thermodynamic change.