Scores the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. CADD integrates many diverse annotations into a quantitative score. The basis of CADD is to contrast the annotations of fixed or nearly fixed derived alleles in humans relative to simulated variants. CADD can quantitatively prioritize functional, deleterious, and disease causal variants across a wide range of functional categories, effect sizes and genetic architectures and can be used prioritize causal variation in both research and clinical settings.
A sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) that encodes cell type-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic contexts and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. deltaSVM provides a powerful computational approach to systematically identify functional regulatory variants.
Provides a generalized linear model for functional genomic data and genome annotations. LINSIGHT is a computational method that outperforms state-of-the-art prediction methods in the task of prioritizing noncoding disease variants from the Human Gene Mutation database (HGMD) and the National Center for Biotechnology Information (NCBI) ClinVar database. By integrating a large number of genomic features, LINSIGHT provides a precise, high-resolution description of the fitness consequences of noncoding mutations in human genome.
Supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations. The GWAVA web server allows users to retrieve precomputed scores from each of the three classifiers for all known germ-line and somatic SNVs found in Ensembl release 70.
A deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types. It can further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
Allows prediction of the functional consequences of non-coding and coding single nucleotide variants (SNVs). FATHMM-XF is a method consisting in an improvement over the predictor FATHMM-MKL. The software was built using supervised machine learning with labeled examples ascribed to pathogenic (positive) or benign (neutral) mutations. It assigns a confidence score (a p-score) for every prediction to simplify interpretation and focus analysis on a subset of high-confidence predictions (cautious classification).
This tool is specialized to prioritize somatic variants from cancer whole genome sequencing. FunSeq contains two components: 1) building data context from various resources; 2) variants prioritization.
Provides context-dependent epigenomic weighting for regulatory variant prioritization. Cepip is a context-dependent model that could examine important chromatin features surrounding an eQTL and predict its regulatory potential. This framework measures the regulatory probability of genetic variants in a context-dependent manner. This model is superior to existing cell type-specific methods and exhibit significant genome-wide association study (GWAS) signal enrichment.
Annotates noncoding variant and allows functional scoring. IW-Scoring gives access to the gene centric data and overlapping regulatory elements from a wide range of annotation resources. It delivers two separate linear weighted functional scoring schemas for both known and novel variations. This tool was used to recognize and prioritize functional variants in genome wide association study (GWAS), expression quantitative trait locus (eQTL) and cancer studies.
Analyses genome wide association studies (GWAS) signals and determines context-specific regulatory variants. GWAS4D incorporates uniformly processed genomic/epigenomic data, integrated transcription regulator motif data, comprehensive functional annotations, and a prioritization method for regulatory variants. This resource provides almost 3100 motifs for 1480 transcription regulators. It also integrates multiple functional prediction scores of non-coding variants for all the possible single nucleotide variants. This provides an integrated compendium for evaluating variant effects on DNA–protein interactions.
Aims to recognize pathogenic variants by annotating genetic variants, and especially noncoding variants. DANN is based on a deep neural network (DNN) algorithm consisting of an input layer, a sigmoid function output layer, and three 1000-node hidden layers with hyperbolic tangent activation function. This tool prioritizes putative causal variants, such as those derived from genome wide association studies (GWAS).
An informatics strategy that integrates several established bioinformatics tools, for prioritizing regulatory SNPs, i.e. the SNPs in the promoter regions that potentially affect phenotype through changing transcription of downstream genes. Comparing to existing tools, regSNPs has two distinct features. It considers degenerative features of binding motifs by calculating the differences on the binding affinity caused by the candidate variants and integrates potential phenotypic effects of various transcription factors.
An integrative resource for predictions from eight different tools on functional annotation of noncoding variants. PRVCS is designed to predict and prioritize the regulatory variants by integrating existing prediction algorithms (like CADD, GWAVA, FunSeq, etc.) to estimate the composite likelihood of regulatory potential given tools statistics.
A whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. GenoCanyon allows to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation.
Allows characterization of hotspots of epigenetic variability across different cell-types. Haystack can be applied to epigenetic mark and supplies a method to study cell-type identity and the mechanisms underlying epigenetic switches during development. It simplifies biologists’ efforts at analyzing epigenetic data without the burden of coding, and enables researchers to integrate their own sequencing data with information from the public domain.
Provides a clinically applicable pathogenicity predictor committed to splicing-relevant variants. S-CAP is able to determine pathogenic splicing-relevant variants from benign variants. This software diminishes the number of splicing-relevant variants of uncertain significance. It can score for all rare variants in the predefined splicing region in the human genome and utilizes region-specific models to lessen the effects of ascertainment biases in curated pathogenic variant databases.
Provides a method to determine functional genetic variants underlying genetic association signals and to characterize their tissue-specific effects on regulatory elements, target genes, and downstream biological processes. INFERNO is a pipeline available as both a standalone software to perform the full process, and a web application which provides two default genome wide association studies (GWASs) datasets and can compute a lighter analysis.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).