Serves for blind compressed sensing in the context of gene expression. CS-SMAF can find a non-negative, sparse module dictionary, and sparse module activity levels. It employs fixed and variable measurements to investigate gene expression. This tool clusters samples based on the subset of composite observations, searches for a relatively small dictionary to explain the samples in that cluster, and then concatenates the small dictionaries into a large dictionary.
Analyzes the reliability of individual probes directly from gene expression data. A major advantage of the proposed approach is its capability to detect unreliable probes independently of physical models or external, constantly updated information such as genomic sequence data. RPA can be useful in many applications, including evaluation of the end results of gene expression analysis, and recognition of potentially unknown probe-level error sources. It can be also used to quantify the uncertainty in the measurements and in designing the probes, and is also utilized by our model to provide robust estimates of differential gene expression.
Provides class infrastructure and associated methods to construct an Illumina analysis workflow pipeline starting with raw data through functional analysis. Besides supporting the existing algorithms for microarray data, the lumi package includes several unique parts: (i) a variance-stabilizing transformation that utilizes the technical replicates available on the Illumina microarray; (ii) normalization algorithms designed for Illumina microarray data and; iii) the nucleotide universal identifier annotation packages.
Provides several unique features in a modular and flexible system for the analysis of microarray data. The design and modular conception of CARMAweb allows the use of the different analysis modules either individually or combined into an analytical pipeline. CARMAweb performs (i) data preprocessing (background correction, quality control and normalization), (ii) detection of differentially expressed genes, (iii) cluster analysis, (iv) dimension reduction and (v) visualization, classification, and Gene Ontology-term analysis.
Allows the automated application of complex analyses to microarray data which can range from single slides to large data sets including replicates and dye-swaps. It handles output from most commonly used quantification software packages for dual-labelled arrays. Application features range from quality assessment of slides through various data visualizations to multi-step analyses including normalization, detection of differentially expressed genes, and comparison and highlighting of gene lists. A highly customizable action set-up facilitates unrestricted arrangement of functions, which can be stored as action profiles. A unique combination of web-based and command-line functionality enables comfortable configuration of processes that can be repeatedly applied to large data sets in high throughput. The output consists of reports formatted as standard web pages and tab-delimited lists of calculated values that can be inserted into other analysis programs. Additional features, such as web-based spreadsheet functionality, auto-parallelization and password protection make this a powerful tool in microarray research for individuals and large groups alike.
A web-based program for processing microarray data. In completely automated fashion, ExpressYourself will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts.
Implements Partial Least Squares regression to extract the hidden signals of sample-specific heterogeneity in the data and uses them to find the genes that are actually correlated with the phenotype of interest. svapls that can be used to identify several types of unknown sample-specific sources of heterogeneity in a gene expression study and adjust for them in order to provide a more accurate inference on the original expression pattern of the genes over different varieties of samples.
Implements a unified framework for preprocessing microarray data and interfaces with other BioConductor tools for downstream analysis. The Oligo package provides array coordinates, feature types, sequences, feature names and other relevant information for preprocessing. Developers can use oligo solutions to facilitate the integration of their tools with BioConductor. They also benefit from the unified model that the package makes available, as the consistency in data delivery and handling improves efficiency.
Identifies features correlating with a phenotype of interest in the presence of potential confounding factors. Using simulated data, we show that ISVA performs well in identifying confounders as well as outperforming methods which do not adjust for confounding. Using four large-scale Illumina Infinium DNA methylation datasets subject to low signal to noise ratios and substantial confounding by beadchip effects and variable bisulfite conversion efficiency, we show that ISVA improves the identifiability of confounders and that this enables a framework for feature selection that is more robust to model misspecification and heterogeneous phenotypes. Finally, we demonstrate similar improvements of ISVA across four mRNA expression datasets. Thus, ISVA should be useful as a feature selection tool in studies that are subject to confounding.
Inspects a large number of p-values in an effort to detect additional positive cases. EBS offers an automatic screening of the p-values a user may obtain from his or her favorite gene-by-gene analysis software. In addition, the current procedure utilizes the p-values and not the test statistics; therefore, it has broader applicability to other types of tests such as the F-tests or rank tests. It screens each p-value not only on its own magnitude but also on the basis of the totality of the p-values (or its empirical distribution).
Provides the implementation of distance weighted discrimination (DWD) using an interior point method for the solution of second order cone programming problems. DWD is related to, and has been shown to be superior to, the support vector machine in situations that are fundamental to bioinformatics, such as very high dimensional data. DWD has proven to be very useful for several fundamental bioinformatics tasks, including classification, data visualization and removal of biases, such as batch effects.
Performs high-throughput expression analysis, with accurate and consistent results. Codelink is a single-channel microarray platform that uses 30-bp oligonucleotide probes designed for three different organisms; human, mouse and rat. It facilitates reading, preprocessing and manipulating Codelink microarray data. The raw data must be exported as text file using the software. The tool provides users with an easy to use interface for the analysis of data on the R platform.
A module-based prediction strategy via unsupervised gene clustering to overcome the drawbacks of traditional gene-based prediction (GBP) models. MBP is portable to any test study as long as partial genes in each module exist in the test study. It provides slightly improved accuracy while is considerably more robust than traditional GBP. The method takes advantage of information from genes sharing similar expression patterns. The results of the current study show that the prediction accuracies of the MBP method are slightly better than those of the GBP method in both within-study and inter-study predictions.
Features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. VBMP is an R package for Gaussian Process classification of data over multiple classes. It incorporates feature weighting by means of Automatic Relevance Determination. The vbmp package implements a VB approach to classification of multi-class datasets. This non-parametric approach is developed within a probabilistic framework for Bayesian inference, which yields to efficient sparse approximations by optimizing a strict lower bound of the marginal likelihood of a multinomial probit regression model.
Automates FASTA file inspection rendering files compatible for a variety of downstream bioinformatics tools. Fasta-O-Matic reports any issues detected to the user with optionally color coded and quiet or verbose logs. It can serve as a general pre-processing tool in bioinformatics workflows and as a sanity check for bioinformatic core facilities. This tool is useful to repeat common analysis steps on FASTA files received from disparate sources.
A package for the automatic detection and masking of blemishes in HDONA microarray chips. Harshlight’s algorithm combines image analysis techniques with statistical approaches to recognize three types of defects frequent in Affymetrix microarray chips: extended, compact, and diffuse defects. It provides a way to safely identify blemishes of different nature and correct the intensity values of the batch of chips provided by the user. The corrections made by Harshlight improve the reliability of the expression values when the chips are further analyzed with other programs, such as GCRMA and MAS5.
A package for ranking differentially expressed gene expression time courses through Gaussian process regression. gprege fits two GPs with the an RBF (+ noise diagonal) kernel on each profile. One GP kernel is initialised wih a short lengthscale hyperparameter, signal variance as the observed variance and a zero noise variance. It is optimised via scaled conjugate gradients (netlab). A second GP has fixed hyperparameters: zero inverse-width, zero signal variance and noise variance as the observed variance. The log-ratio of marginal likelihoods of the two hypotheses acts as a score of differential expression for the profile. Comparison via ROC curves is performed against BATS.
Combines raw data of different microarray platforms into one virtual array. virtualArray consists of several functions that act subsequently in a semi-automatic way. Doing as much of the data combination and letting the user concentrate on analysing the resulting virtual array. Using this software package, researchers can easily integrate their own microarray data with data from public repositories or other sources that are based on different microarray chip types.
Reduces probe hybridization bias from experiments performed on the Affymetrix microarray platform, allowing accurate assessment of germline influence on gene expression. equalizer uses genome variant data to modify annotation files for the commonly used Affymetrix IVT and Gene/Exon platforms. These files can be used by any microarray normalization method for subsequent analysis.
Detects and removes probes with different binding affinity in Affymetrix array expression data. The method implemented in maskBAD performs better than other methods in detecting binding affinity different (BAD) probes. Identification and removal of BAD probes removes spurious gene expression differences and helps to reveal real ones. In clustering analysis of gene expression, identification of BAD probes guides interpretation of discriminating probe sets.
Allows management and processing of microarray data obtained from different experimental designs, equipment and platforms. D-MaPs is a web-based system implemented to manage GeneTAC, ScanArray, CodeLink, NimbleGen and Affymetrix microarray platforms. The software provides, for each user, a separate workspace, called projects, for storing and analyzing microarray data. It allows researchers who are unfamiliar with computation and statistics to obtain results of differential gene expression.
Allows surrogate variant analysis of high- throughput datasets. SVAw is a web and standalone application that enables researchers to utilize Surrogate Variable Analysis (SVA) when analyzing high throughput genomic data. It aims to capture heterogeneities in the dataset that can potentially lead to biased analysis of the data. The software calculates probe/gene statistics such as the fold change and p-value for both pre (unadjusted) and post SVA analysis (adjusted with sva).
Permits removing electrical stimulation artifacts from multielectrode array recordings. ERAASR can recover the full time-course of neural signals during and after stimulation. It exploits the similarity of artifacts across multiple electrodes, multiple pulses within a pulse train, and multiple trials of repeated stimulation.
An easy to use tool for dual color microarray data. genArise’s GUI-Tk based environment lets any non-experienced user performs a basic, but not simple, data analysis just following a wizard. In addition it provides some tools for the developer.
Furnishes functions for preprocessing, normalization, interpretation and visualization of raw microarray gene expression data. IBD identifies new biomarkers in gene expression and comparative genomic hybridization (CGH) data. It can be applied for experimental design, quality control investigation, array normalization, differential expression analysis or functional analysis, between others.
A package Facilitates personalized-medicine workflows, based on a microarray normalization method. Rather than processing microarray samples as groups, which can introduce biases and present logistical challenges, SCAN normalizes each sample individually by modeling and removing probe- and array-specific background noise using only data from within each array. SCAN can be applied to one-channel (e.g., Affymetrix) or two-channel (e.g., Agilent) microarrays. The Universal exPression Codes (UPC) method is an extension of SCAN that estimates whether a given gene/transcript is active above background levels in a given sample. It can be applied to one-channel or two-channel microarrays as well as to RNA-Seq read counts. Because UPC values are represented on the same scale and have an identical interpretation for each platform, they can be used for cross-platform data integration.
A package that extends and improves the functionality of the base affy package. Routines that make heavy use of compiled code for speed. Central focus is on implementation of methods for fitting probe-level models and tools using these models.
A tool that can be used to supplement the annotations provided for probes on the Affymetrix GeneChip platform. Output from this tool characterizes SNP-containing probes on GeneChip microarrays, thus improving our understanding of factors contributing to expression measurements. The SNP detection results from AffyMAPSDetector can be effectively used in the data analysis phase of a microarray experiment.
☛ I am a DBT Certified bioinformatician (BINC, Govt. Of India) with deep interest in Genomics and next generation sequencing technologies. I have enjoyed my industry experience of 7+ years working on design, implementation and analysis of microarray data and high-throughput NGS data.
☛ I love the idea of automation and my passion for programming has motivated me to learn some of the top-notch programming languages like PERL, Python, bash and R. I have developed several programs required to analyze, manipulate or automate key steps in sequencing data analysis. Programming is so much fun!
☛ I am an avid learner and I often try to engage myself in online courses because I strongly believe that in a quickly evolving domain like bioinformatics, the only way to keep up the pace with is to keep learning, both within and outside the job hours.
☛ I support the concept of “open-source” and try to share as much as knowledge regarding bioinformatics and its scope with students. I have been formally certified as “lecturer” (CSIR-NET-JRF+LS, Govt. of India) and involved in providing bioinformatics knowledge sharing sessions and cross functional trainings to technical teams and graduate students.
☛ You can connect with me via gmail, LinkedIn or over a call to discuss ideas or if you need any guidance. Here are my contact details.
✉ [email protected]
✆ +91 966 22 55 385
▣ Linkedin: https://in.linkedin.com/in/lakhujanivijay
❝Everybody in this country should learn to program a computer, because it teaches you how to think❞
✍ Steve Jobs