Offers a way to manage pipelines. Toil supports arbitrary worker and leader failure, with strong check-pointing that allows resumption. It can be employed to run scientific workflows on a large scale in cloud or high-performance computing (HPC) environments. This tool was used to compute gene- and isoform- level expression values for 19 952 samples from four studies.
Processes next-generation sequencing (NGS) data through a semi-automated pipeline. WES encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. It uses a variety of chosen computational tools to manage and analyze NGS data.
A pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants.
Allows integrated analysis of next-generation sequencing (NGS) data. RUbioSeq is a multi-platform application that uses well established tools to implement pipelines for DNA-seq, CNAseq, bisulfite-seq and ChIP-seq experiments. The software incorporates a graphical user interface (GUI), designed for interdisciplinary research groups where bioinformaticians and biomedical researchers work together. The modular structure permits easy adaptation and extension.
An open-source software tool that can reliably and easily extract mitochondrial genome information from exome and whole genome sequencing data. MitoSeek evaluates mitochondrial genome alignment quality, estimates relative mitochondrial copy numbers and detects heteroplasmy, somatic mutation and structural variants of the mitochondrial genome. MitoSeek can be set up to run in parallel or serial on large exome sequencing datasets.
Allows to develop tailored workflows for the analysis of whole-exome (WES), whole-genome (WGS), and transcriptome (RNA-seq) sequencing data. NGS-pipe is an automated framework for the design of pipelines for the analysis of large-scale sequencing data, such as cancer genomics data. It provides building blocks to execute state-of-the-art tools, as well as appropriate error handling. It also overcome the common lack of automated procedures to ensure reproducibility.
A variant detection pipeline designed to process high throughput sequencing data, with the purpose of identifying potentially pathogenic mutations. Cpipe offers an industry standard variant calling pipeline with a suite of additional features needed by diagnostic laboratories added on top.
Serves as a scalable composition tool for implementing asynchronous Boolean logic within a multi-cellular consortium. CALIN consists of two main steps: it first decomposes the Boolean function of interest into a disjunction of conjunctive clauses via the Quine-McCluskey algorithm. The software then converts each clause into a given computational device for which particular connections are created between integrases and inputs.
Comprises a fixed implementation of alignment, calling and annotation tools optimised for individual/multiple exome sequencing analysis in the clinical setting or the research. OpEx is customizable with options available in its configuration file, this including setting the reference genome manually and changing the transcript database. This tool uses Platypus for variant calling, which provides equivalent performance irrespective of the number of exomes being analysed.
Analyzes raw sequencing data from several next generation sequencing (NGS) platforms. MutAid is a pipeline performing six different steps: (i) quality control and filtering; (ii) mapping reads to reference genome; (iii) variant detection, effect prediction and cross-referencing and lastly (iv) and then produces a summary of all information generated. It can be used to interpret mutational variants from various data generated by targeted gene-panel sequencing or whole genome sequencing.
Allows to obtain biological insight into genetic events investigated by exome sequencing. Simplex is an automatized pipeline for investigating exome single-end (SE) and paired-end (PE) sequencing data generated by deep sequencing devices from Illumina and ABI SOLiD. The pipeline combines published and in-house developed applications and is continuously, automatically tested. Simplex is provided as a ready to use VirtualBox image and a fully configured Cloud image.
Support a series of analyses commonly required for targeted resequencing and whole exome sequencing data, including: single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses.
Provides a platform for storing, handling and processing next generation sequencing (NGS) data. OTP performs both data management and processing and allows users to automate an entire process to raw data import from storage. The application includes functionalities for check quality control and sequence alignment and identify single-nucleotide and structural genomic events.
Performs a complete whole-exome sequencing pipeline and provides easy access through interface to intermediate and final results. A user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and user-friendly web pages for annotated variant visualization.
A whole exome sequencing (WES) data analysis pipeline that integrates both single nucleotide variants and copy number alterations from WES data to identify a list of candidate genes for therapeutic targets. IMPACT returns a prioritized list of drugs predicted to target these cancer genes using a drug-target database. IMPACT analysis pipeline also allows users to explore the tumor heterogeneity of the sample. This allows users to understand the clonal dynamics from WES data during the course of treatments, or comparisons of clinically similar samples to each other.
An integrated analysis pipeline for whole-exome sequencing (WES) data analysis. Fastq2vcf offers improved flexibility, efficiency, and reproducibility. It can generate shell scripts that automate the steps for processing WES data from raw sequence reads to annotated variants. It is also highly configurable and provides users with complete control of the processing procedure, making it easy to submit and track jobs in both single workstation and parallelized computing environments.
Supports the management and analysis of exome sequencing data. SeqBench combines data management, data analysis, and result visualization into a single web-based platform. The software is a web-based application that embeds the analysis pipeline SIMPLEX and provides a data acquisition module supporting data derived from Illumina and SOLiD platforms. The server provides a user interface, facilitates comprehensive data acquisition through a wizard system and can be installed as an out of the box solution.
An R script that allows to infer ethnicity of a set of samples for which whole exome sequencing (WES) data is available from differential SNP genotypes profiles. EthSEQ combines the 1,000 Genomes Project genotype data, used to generate reference models for specific WES platforms; the ASEQ, used to genotype the input samples with unknow ethnicity; and the EIGENSTRAT, used to perform principal component analysis on the aggregated genotyped data. This tool can be integrated into any WES based processing pipeline and exploits multi-core capabilities.
Constructs shell scripts which can then be submitted automatically to a scheduling system or run locally. PipEngine is particularly useful for next generation sequencing (NGS) but can work on any job scheduling system. It was developed to increase the reproducibility of scientific results. This tool automatically generates application and offers way to add extra functionalities like error controls and logging.
Automates the process of genotyping microsatellite repeats in Huntington disease (HD) data. ScaleHD is a pipeline designed to be used for large-scale automated genotyping of HTT GAC/CCG repeat parallel sequencing data. It performs quality control, sequence alignment and genotyping on all file pairs presented by the user as input. The pipeline consists of three main stages: sequence quality control (SeqQC), sequence alignment (SeqALN) and automated genotyping (GType).
Provides analysis, management and visualization tools for next-generation sequencing (NGS) data. Strand NGS supports extensive workflows for alignment, RNA-seq, small RNA-seq, DNA-seq, Methyl-seq, MeDIP-seq and ChIP-seq experiments. This tool includes standard differential expression analysis for different experimental conditions, as well as differential splicing analysis. It can notice variants in the transcriptome and gene fusion events.
A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.
Enables users to run bioinformatics workflows locally or in the cloud. Loom is a platform-independent tool for creation, execution, tracking and sharing of workflows. The software allows repetition of analysis, using Docker for reproducing the user runtime environment, recording file hashes for verifying analysis inputs and keeping reproducible records. It is developed for multi-step analyses with large data files passed between steps, but is not specific to bioinformatics.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).