Enables analysis and visualization of the information content of genomic signals. MSR is a method, adapted from an image segmentation algorithm and inspired by multiscale approaches for classifying image texture patterns. The software enables global analysis of genomic data in an unbiased manner with respect to the spatial scales on which biological information is encoded. It was used to analyze measurements of transcription factor binding, covalent histone modifications and DNA methylation, as well as genomic annotations and sequence-derived data.
Detects motifs in large scale chromatin-immunoprecipitation (ChIP) data. Trawler is a program that can be run according two different manners: (i) a standalone version providing a pipeline that generates position weight matrices (PWMs) from the extraction and clustering of over-represented motifs; and (ii) a web application supplying the possibility to submit sequences in both FASTA or BED format, to rank predicted motifs by conservation score as well as to produce a set of background sequences.
An interactive visualization tool for functional genomics data. Epiviz supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations. Epiviz sets a precedent for genomic data analysis collaborative workflows by enabling reproducible and shareable steps, and allowing custom user code to be dynamically incorporated, while guaranteeing the security and integrity of user data.
Allows users to store, visualize and analyze epigenomics and transcriptomics data using a biologist-friendly web interface, without the need for programming expertise. Predefined pipelines allow users to download data, visualize results on a genome browser, calculate RPKMs (reads per kilobase per million) and identify peaks. Advanced capabilities include differential gene expression and binding analysis, and creation of average tag -density profiles and heatmaps.
Produces profiles of genomic datasets over groups or regions of interest (ROI) by using an absolute or relative method. VAP uses a simple gene list to generate individual or aggregate graphs of the genomic signal via a customizable number of reference points. This software allows up to six reference points to delimit sections of interest to avoid contamination of the signal from adjacent features. The vap_interface module performs the graphical representation whereas the vap_core handles the command line.
Annotates ChIP-seq data analysis. ChIPseeker supports annotating ChIP peaks and provides functions to visualize ChIP peaks coverage over chromosomes and profiles of peaks binding to TSS regions. Comparison of ChIP peak profiles and annotation are also supported. Moreover, it supports evaluating significant overlap among ChIP-seq datasets. Currently, ChIPseeker contains 15,000 bed file information from GEO database. These datasets can be downloaded and compare with user's own data to explore significant overlap datasets for inferring co-regulation or transcription factor complex for further investigation.
Allows to annotate and visualize ChIP-seq peak data. PAVIS is a web-based tool designed and developed for ease of use for biologists or bench scientists. The software offers two primary functions: peak data annotation and peak visualization within relevant genomic context. PAVIS annotation report provides summary statistics such as the number of annotated peaks and relative enrichment level in each genomic feature category. The visualization interface is launched directly from the PAVIS annotation report.
Allows annotation, bulk summarization, and visualization of genomic intervals over predefined genomic annotations. Genomation is an R package designed for interrogating diverse types of genomic intervals with or without scores. The software reduces the time needed for data processing and biological inference by providing several functions. It can be used with a variety of genomic interval file types.
Aggregation plots are frequently used to evaluate signal distributions at user-interested points in ChIP-Seq data analysis. agplus enables rapid and flexible generation of text tables tailored for aggregation plots from which users can easily design multiple groups based on user-definitions such as regulatory regions or transcription initiation sites.
Analyzes local epigenetic neighborhood of a set of single nucleotides polymorphisms (SNPs). SNPhood is a package allowing users to exploit data from next generation sequencing (NGS) by offering a mean to (i) discover allelic bias through regions of interest (ROI); (ii) browse and view genotype-dependent binding patterns and (iii), make genotype-dependent comparisons and grouping of the binding pattern across ROI and samples.
A cross-platform desktop application developed for interactive visualization, exploration and clustering of epigenomic data such as ChIP-seq experiments. ChAsE is designed and developed in close collaboration with several groups of biologists and bioinformaticians with a focus on usability and interactivity. Data can be analyzed through k-means clustering, specifying presence or absence of signal in epigenetic data, and performing set operations between clusters. Results can be explored in an interactive heat map and profile plot interface and exported for downstream analysis or as high quality figures suitable for publications.
A web-tool that allows genome scale comparison of high throughput experiments (ChIP-seq, RNA-seq and CAGE) provided by a user, to the data in the public domain. Heat*seq allows users to contextualise their sequencing data with respect to vast amounts of public data in a few minutes without requiring any programming skills. Heat*seq currently contains over 12,000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualise user experiments. High quality figures and tables are produced and can be downloaded in multiple formats.
Allows to visualize, analyze and store genomic positioning data. Podbat is an integrated computational tool develop for use on epigenomics datasets. The software follows a simple flowchart and implements a flexible genome browser in its core. Genomes can easily be imported and updated as the software connects directly to Ensembl, facilitating the use of the latest available annotations. Podbat was optimized using the yeast genomes of S. cerevisiae and S. pombe.
Provides summary statistics in graphs and offering several commonly demanded analyses. ChIPseek integrates HOMER et BEDTools software and enables peak annotation, locations, sequences and statistics such as charts and histograms for the visualization of the properties of the peaks. Users can explore peaks further via an UCSC genome browser. It contains filter tools to select interested peak subsets based on peak lengths and other characterictics.
A web-based service for evaluating the colocation of genomic features. Users submit genomic regions of interest, for example, a set of locations from a ChIP-seq analysis. ColoWeb compares the submitted regions of interest to the location of other genomic features such as transcription factors and chromatin modifiers. To facilitate comparisons among various genomic features, the output consists of both graphical representations and quantitative measures of the degree of colocalization between user’s genomic regions and selected features. Frequent colocation may indicate a biological relationship.
Generates a positional correlation histogram for two specific genomic features. ChIP-Cor enables users to obtain aggregation plots (APs) for peak lists with a great variety of target features. The software offers several options for scaling the abundance of the target feature. It reads ChIP-Seq tag positions and can create a positional correlation histogram for two genomic features. This tool suits for educational purposes.
Allows users to generate composition-matched background sequence sets. BiasAway is a program that provides users six approaches for generating a background useful to overrepresentation analyses: (1) mononucleotide shuffled sequences; (2) dinucleotide shuffled target sequence; (3) genomic sequences matched to the mononucleotide GC content of each target sequence; (4) sliding windows of mononucleotide shuffled sequence, 5) sliding windows of dinucleotide shuffled target sequence, and 6) genomic sequences matched in windows of internal mononucleotide GC content for each target sequence.
Provides visualization and postprocessing options. MotIV is a program that assists users to identify biologically relevant combinations of motifs. It includes functions for visualizing and annotating motifs and identifies motif combinations that have nonrandom spatial relationships. Moreover, this program offers features for plotting the frequency distributions of the distance between motif occurrences and peak centers.
A simple GUI software tool for visualizing published ChIP-seq raw data. SraTailor automatically converts an SRA into a BigWig-formatted file. Simplicity of use is one of the most notable features of SraTailor: entering an accession number of an SRA and clicking the mouse are the only steps required to obtain BigWig-formatted files and to graphically visualize the extents of reads at given loci. SraTailor is also able to make peak calls, generate files of other formats, process users' own data, and accept various command-line-like options. Therefore, this software makes ChIP-seq data fully exploitable by a wide range of biologists.
Extracts raw Illumina reads to differentially expressed genes. SPARTA is a bacterial RNA-seq analysis tool performing transcriptional profiling experiments using RNA-seq. It enables microbiologists to simplify their researches and provides supplies the ability to incorporate a hands-on approach to next-generation sequencing (NGS) technologies in the classroom. Moreover, it outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots.
Enables the broad adoption of next generation sequencing (NGS) based inquiries. dictyExpress is a web application designed for exploration of transcriptomics datasets. It allows users to select among experiments and specify genes to analyze; visualize the expression time courses of those genes; identify gene clusters; examine pre-processed differential expression datasets; and perform Gene Ontology (GO)-term enrichment analysis.
Allows clustering, alignment and clustering of epigenomic marks. DGW is based on an algorithm that enables construction of robust speech recognizers undeterred by the variability in pitch and speed of enunciation. The software can align genomic landmarks such as transcription start sites (TSSs) and first splicing sites (FSSs) on real epigenomic data from the ENCODE project. The software can be useful for exploratory data analysis of high throughput epigenomic data sets.
A software package that allows for simple exploration, clustering and visualization of high-throughput sequencing experiments. fluff contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines.
Decrypts key biological features from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data. ChIPdig aims to assist experimentalists process multi-sample ChIP-seq data. It can map reads to a reference genome, normalize and compare ChIP-seq data sets corresponding to different targets or conditions, display coverage in a set of regions thanks to heatmaps and a comparative metaplot.
Implements a flexible way to plot genome-wide data in many forms. Heatmaps is a software which provides functions for plotting heatmaps of genome-wide data across genomic intervals, such as ChIP-seq signals at peaks or across promoters. This R package also offers to users several functions for investigating sequence features.
A ChIP-seq and RNA-seq software suite for visualizing and analyzing genomic data, including features to support differential expression analysis, heatmap production, principal component, gene ontology and dynamic network analysis. MicroScope allows the user to magnify any portion of a heatmap by a simple click-and-drag feature to zoom in, within the R programming environment, without any prerequisite programming skills required of the user. MicroScope is designed with large gene expression heatmaps in mind, where individual gene labels overlap and render the text unreadable.
Plots the short reads binding profiles of a ChIP-seq experiment. D-peaks is a a ChIP-seq result analysis tool which draws a precise representation of several ChIP-seq experiments along the genomes in a few simple steps. Some options, such as the absolute (and the relative) genomic position, the DNA strand and some other aesthetic possibilities (labels and scale of the axes, colors of the peaks, etc.) must be specified to obtain the resulting figure.
PhD ès Neurosciences, I worked 8 years on the brain and its diseases. I then specialized in bioinformatics (NGS, epigenetics) and worked in CEA and GENETHON before to join OMICX and help OMICtools community.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).