Allows users to query, visualize, analyze, and compare plant genome and pathway data across crops and model species. Gramene is a resource that uses information generated from projects supported by public funds to improve the study of cross-species comparisons. The database provides a search interface, and views and functionalities for Plant Reactome. It also shares infrastructure, specialized software components and pre-computed data with Ensembl Plants.
Aims to provide the information and technical resources to support high-throughput positional cloning in Drosophila melanogaster. These resources include a high-density genome-wide map of single nucleotide polymorphisms (SNPs), and inexpensive, high-throughput assays for SNP genotyping.
A dynamic database featuring ciliate genome rearrangement annotations that include macronuclear destined segment, internal eliminated segment, and pointer annotations. MDS_IES_DB provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila.
A leading international network infrastructure for archiving and worldwide provision of mouse mutant strains. The EMMA database gathers and curates extensive data on each line and presents it through a user-friendly website. A BioMart interface allows advanced searching including integrated querying with other resources e.g. Ensembl. Other resources are able to display EMMA data by accessing our Distributed Annotation System server.
The public web site of the Molecular and Functional Diversity in the Maize Genome project. Panzea provides access to the genotype, phenotype and polymorphism data produced by the project through user-friendly web-based database searches and data retrieval/visualization tools, as well as a wide variety of information and services related to maize diversity.
Integrates SNP and gene annotation information with a graphical viewer. AutoSNPdb hosts data for the important crops rice, barley and Brassica. Users may rapidly identify polymorphic sequences of interest through BLAST sequence comparison, keyword searches of annotations derived from UniRef90 and GenBank comparisons, GO annotations or in genes corresponding to syntenic regions of reference genomes. In addition, SNPs between specific varieties may be identified for targeted mapping and association studies.
Develops genomic tools for Melon (Cucumis melo) breeding: transcriptome, SSR, SNP collections and high throughput genotyping platforms. Melogene includes a new version of the melon transcriptome with of 49,741 unigenes generated at COMAV. These unigenes have been annotated, screened for SSR motifs and used to identify a large SNP collection suited for high-throughput mapping purposes.
A comprehensive web resource developed for bridging soybean translational genomics and molecular breeding research. It provides information for six entities including genes/proteins, microRNAs/sRNAs, metabolites, single nucleotide polymorphisms, plant introduction lines and traits. It also incorporates many multi-omics datasets including transcriptomics, proteomics, metabolomics and molecular breeding data, such as quantitative trait loci, traits and germplasm information. Soybean Knowledge Base has a new suite of tools such as In Silico Breeding Program for soybean breeding, which includes a graphical chromosome visualizer for ease of navigation. It integrates quantitative trait loci, traits and germplasm information along with genomic variation data, such as single nucleotide polymorphisms, insertions, deletions and genome-wide association studies data, from multiple soybean cultivars and Glycine soja.
Provides an online searchable web-based catalog of mouse resources, including inbred, mutant, and genetically engineered mice, cryopreserved embryos and gametes, and embryonic stem (ES) cell lines. IMSR is a dynamic data system that provides, for each strain or cell line, links for ordering, links to the repositories’ strain description, and links to phenotype and disease model data. The database aims to assist investigators in finding the mouse resources needed for their studies.
A database of rice genomic variations. The database provides comprehensive information of 6,551,358 single nucleotide polymorphisms (SNPs) and 1,214,627 insertions/deletions (INDELs) identified from sequencing data of 1,479 rice accessions. It is free and open to the public with comprehensive functions.
Gathers information about zebrafish protein-coding gene and focuses on mutagenic insertions. ZInC intends to allow users to check if a specific gene has been mutagenized by proviral insertion. Searches can be made by ID including Ensembl, GenBank, RefSeq and ZFIN. Users can also make searches by single gene, gene lists or KEGG biological pathways.
Provides a real-time database with full and unrestricted access to all information. The zfishbook site is designed to be a multiuser and dynamic database. It represents a central hub for molecular, expression and mutational information about gene-breaking transposon (GBT) lines from the International Zebrafish Protein Trap Consortium (IZPTC) that includes researchers from around the globe. This resource is open to community-wide contributions including expression and functional annotation.
Stores all the polymorphic sequences for the Mammalia class. MamPol contains polymorphism data, including both nucleotide sequences and their associated diversity estimates. The database provides estimates of both one dimensional and multi-dimensional measures of nucleotide diversity in polymorphic sets. The website integrates the information from the databases and offers several interfaces for browsing the contents of database in different ways as well as a set of common analysis tools. It facilitates comprehensive meta-analyses involving both multi-locus and multi-species polymorphic data.
Furnishes pig gene annotations in all sequenced genomic regions. PigGIS gathers 3.84 million whole genome shotgun (WGS) records generated by the Sino–Danish Pig Genome Project, 870 084 expressed sequence tags (ESTs) from 100 differentiated pig tissues/developmental stages, and 589 996 genomic reads together with 570 773 mRNA sequences extracted from GenBank.
A platform to enable statistical genetics and genomics studies of Caenorhabditis elegans and to connect the results to human disease. CeNDR provides the research community with wild strains, genome-wide sequence and variant data for every strain, and a GWA mapping portal for studying natural variation in Caenorhabditis elegans. CeNDR offers reduced redundancy of data collection (e.g. whole-genome sequencing) along with consistent data collection and organization as a centralized resource. The unification of strain management facilitates studies of natural variation across the wide Caenorhabditis community and beyond.
Collects inherited disorders, other single-locus traits, and genes in animal species, excluding human, mouse and rat. OMIA is a knowledgebase that provides access to publications dating back to the turn of the century describing about 2500 phene-species across more than 200 animal species, with a third of phenes presenting in more than one species. The database covers all animals with an emphasis on domesticated species. OMIA records are associated with data from other resources linked to the same controlled term.
An integrated information system for the storage, retrieval, visualization and analysis of chicken DNA sequence variation. To enhance the discovery of relationships between sequence variation and genes, we mapped each variant onto the RJF reference genome sequence in the context of gene annotations and other relevant features, such as genetic markers and QTLs. Therefore the ChickVD database, provides both a powerful information resource and an analysis workbench for applications in biological research, medicine and agriculture. A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches.
Provides information for large-scale mutagenesis in mice. PBmice displays, retrieves and stores information derived from piggyBac (PB) insertions (INSERTs) in the mouse genome. It offers several types of data such as genomic locations and flanking genomic sequences, the expression levels of hit genes, and the expression patterns of trapped genes.
A data container for the variation information of dog/wolf genomes. DoGSD was designed and constructed as a SNPs detector and visualization tool to provide the research community a useful resource for the study of dog's population, evolution, phenotype and life habit. DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information.
A rice knowledgebase to achieve data integration through community-contributed modules. IC4R provides a reference genome with standardized and accurates gene annotations based on huge amounts of omics data. IC4R was designed for scalability and sustainability, integrating data from remote resources through APIs. IC4R bears the potential to serve as a one-stop knowledgebase to make big data accessible to the rice research community and function as a valuable resource not only for plant researchers in molecular biological studies but also for breeders in rice production and improvement.
A secondary database that provides a collection of all well-annotated polymorphic sequences in Drosophila together with their associated diversity measures and options for reanalysis of the data that greatly facilitate both multi-locus and multi-species diversity studies in one of the most important groups of model organisms. DPDB includes analysis tools for sequence comparison and the estimation of genetic diversity, a page with real-time statistics of the database contents, a help section and a collection of selected links.
Develops genomic tools for Squash (Cucurbita spp.) breeding: transcriptome, genetic map, SNP collections and high throughput genotyping platforms applied to the competitive development of new cultivars, with improve fruit quality and disease resistance. All these tools are being essential to assist a competitive breeding process in the species obtaining varieties with resistance to pests and diseases and with improved fruit quality.
A multi-species database to disentangle the SNP chip jungle. Features of SNPchiMp include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). This platform allows easy integration and standardization, and it is aimed at both industry and research. It also enables users to easily link the information available from the array producer with data in public databases, without the need of additional bioinformatics tools or pipelines.
Stores bacterial variant data to facilitate reproducible and scalable analysis of bacterial populations. SnapperDB is a solution that was made to embark on whole genome sequencing pathogen surveillance at a national level for over 20,000 genomes. It can assist in production of investigation about Salmonellosis or Escherichia coli.
A sorghum genome SNP database. SorGSD covers a diverse collection of 48 sorghum lines that fall into four groups, viz., improved varieties, landraces, wild and weedy sorghums, and a wild relative Sorghum propinquum. SorGSD provides a detailed summary of SNP information and their relevant annotations for all individual accessions, such as allele information, gene information, SNP density and external links to other resources.
Provides genome sequence/variant information for wild Oryza species together with that of several cultivated strains, in close collaboration with Oryzabase, ensuring easy access to information about geographical origins, phenotypic traits, mutants and genetic resources. The current version of OryzaGenome consists of genomic variants from 446 O. rufipogon accessions derived by an imputation method and variants from 17 accessions by imputation-free deeper (up to approximately 90×) sequencing along with the Os-Nipponbare-Reference-IRGSP-1.0 reference genome of O. sativa ssp. japonica cv. Nipponbare. Our goal is to establish a pan-Oryza genomic repository that covers both reference genome sequences and genomic variant information.
Provides an informational resource about Salmonid species. SalmoBase contains atlantic salmon genome reference, annotation and gene expression data. It offers a genome browser that allows users to search their sequences against the entire reference genome repeat masked genome, predicted protein sequences, as well as transcript sequence databases. The database contains about 37,000 high confidence protein coding genes.
Provides an access to the information on the cDNA clone resources, full-length mRNA sequences, gene structures, expression profiles and functional annotations of genes of Solanum lycopersicum (tomato). TOMATOMICS provides powerful database functions for searching, browsing, retrieving, visualizing, and downloading information through a simple, intuitive and interactive graphical web interface.
A haplotype map database by using validated single nucleotide polymorphism (SNP) information for the world and Japanese rice collections. The association of SNP allele frequencies with quantitative trait loci (QTLs) and functionally characterized genes could be clarified by using two other databases, Q-TARO and OGRO, constructed on the same platform. The allele frequency of each SNP can be visualized in the SNP genome browser similar to the human HapMap database. To obtain information on SNPs in any genomic region and to design cleaved amplified polymorphic sequence (CAPS) markers, we also constructed a tool for SNP searches and design of primer pairs. We also provide information for a core set of 768 SNPs selected for the analysis of genetic diversity and QTL mapping in the world rice collection.
Provides molecular markers with genome information and single nucleotide polymorphism (SNP) data for mango. MiSNPDb has customized advance search options for haplotypes, depth and the varieties having common SNPs. It can be used for visualization of position of respective SNPs which can be employed for development of SNP discrimination/screening assay. This tool is useful for expediting mango genomic research.
A genus-wide collection of transposable elements and repeated sequences across 11 diploid species of the genus Oryza and the closely-related out-group Leersia perrieri. The database consists of more than 170,000 entries divided into three main types: (i) a classified and curated set of publicly-available repeated sequences, (ii) a set of consensus assemblies of highly-repetitive sequences obtained from genome sequencing surveys of 12 species; and (iii) a set of full-length TEs, identified and extracted from 12 whole genome assemblies.
Provides genome-wide DNA polymorphisms of plants, with crop plants being the top priority. DNApod is an integrated database of genome-wide DNA polymorphisms detected under uniform analytical conditions from next-generation sequencing (NGS)-generated whole-genome shotgun (WGS) datasets in Sequence Read Archive (SRA). This online resource describes homozygous single-nucleotide polymorphisms (SNPs), homozygous insertion or deletion (InDel) polymorphisms and known-gene annotations for these polymorphisms in rice, maize, and sorghum.
An online resource containing a range of genomic datasets for wheat (Triticum aestivum) that will assist plant breeders and scientists to select the most appropriate markers for marker assisted selection. CerealsDB includes a database which currently contains in excess of 100,000 putative varietal SNPs, of which several thousand have been experimentally validated. In addition, CerealsDB contains databases for DArT markers and EST sequences, and links to a draft genome sequence for the wheat variety Chinese Spring.
Provides a method for the visualization of single nucleotide polymorphisms (SNPs) patterns in sequenced Candida albicans genomes. SNPMap is an interactive tool that allows users to map the positions of individual mutations, mutation types, and het/hom tracts across user-defined regions of the 21 genomes that represent different clades, different sites of infection in the host, and different countries of origin.
A publicly and freely available platform that addresses the increasing need of next generation sequencing data analysis in the Drosophila research community. FlyVar is composed of three parts. First, a database that contains 5.94 million DNA polymorphisms found in Drosophila melanogaster derived from whole genome shotgun sequencing of 612 genomes of D. melanogaster. In addition, a list of 1,094 dispensable genes has been identified. Second, a graphical user interface (GUI) has been implemented to allow easy and flexible queries of the database. Third, a set of interactive online tools enables filtering and annotation of genomic sequences obtained from individual D. melanogaster strains to identify candidate mutations. FlyVar permits the analysis of next generation sequencing data without the need of extensive computational training or resources.
Provides a much needed compendium of genomic variants and their annotations for M. tuberculosis complex (MTBC) and provides the first step toward accelerating genotype–phenotype correlations in the closely related pathogens. tbvar provides a user-friendly interface, closely integrated and interlinked with other major resources in the field. The tool also provides interface for annotation of known variants and identification of novel variants obtained from genome sequencing data sets and could potentially lead to application in clinical settings.
Provides information about collaborative cross (CC) sequence. ISVdb is a database that offers users a list of probabilistic genotype and diplotype data. It also includes different functions such as: (1) allowing rapid simulation of F1 populations; (2) providing predicted variant consequence metadata; (3) preserves imputation uncertainty.
Gathers information about tuberculosis. Pakistan whole genome tuberculosis is an online resource storing different types of stains about tuberculosis disease. This database was based on drug-resistant tuberculosis caused by Mycobacterium tuberculosis (MTB) strains.
A repository of 1.9 million variations (SNPs and InDels) anchored on eight pseudomolecules in a custom database. CicArVarDB includes an easy interface for users to select variations around specific regions associated with quantitative trait loci, with embedded webBLAST search and JBrowse visualisation. It is useful for the chickpea research community for both advancing genetics research as well as breeding applications for crop improvement.
Provides information about various strains of Trypanosoma cruzi. TcSNP gathers T. cruzi sequences, multiple sequence alignments (MSAs) obtained from these sequences, single-nucleotide polymorphisms (SNPs) and small indels identified derived from scanning. The database offers text-based searches for sequences based on attributes derived from their annotation. The result provides a list of genes matching the specified criteria, containing links to the corresponding MSAs, where users can visualize polymorphic sites in different colors, typefaces and styles.
Provides genotype, phenotype, and variety information for rice. The Rice SNP-Seek Database allows to quickly retrieve single-nucleotide polymorphism (SNP) alleles for all varieties in a genome region, find different alleles from predefined varieties and query basic passport and morphological phenotypic information about sequenced rice lines. Users can visualize SNPs with the gene structure using JBrowse. Phylogenetic trees or multidimensional scaling plots can be used to explore evolutionary relationships between rice varieties.
Gathers information about rice. IC4R provides a reference genome with standardized and accurate gene annotations based on huge amounts of omics data and large quantities of rice-related literatures. This library focuses on integrating expression profiles, genomic variations, plant homologs, post-translational modifications (PTMs), literatures as well as community-contributed annotations.
Provides an integrated access to genome sequence, expression data and literature curation for tuberculosis. TBDB contains genome sequence data for a range of species relevant to tuberculosis and other sequenced Mycobacteria for comparative analysis. This repository is composed of more than 2600 genes and 45 gene expression datasets. It offers a suite of tool for the visualization, analysis, download of data.
A free web-based database that allows quick user friendly search to find different types of genomic variations among a group of fully sequenced organisms belonging to M. tuberculosis complex. The searches are based on data generated by pair wise comparison using a tool that has already been described. Different types of variations that can be searched are SNPs, indels, tandem repeats and divergent regions. The searches can be designed to find specific variations either in a given gene or any given location of the query genome with respect to any other genome currently available.
Provides a comprehensive repository to store, access and disseminate single nucleotide polymorphism (SNPs) and spoligotyping profiles of M. tuberculosis. MTCID can be used to automatically upload the information available with a user that adds to the existing database at the backend. Besides it may also aid in maintaining clinical profiles of TB and treatment of patients.
Compiles several sources of mouse single nucleotide polymorphisms (SNP) data. CGDSNPdb is composed of two different datasets: (i) the Imputed SNP Genotype Resource (IGR) which provides probable genotype and associated confidence levels for about 8 million SNPs in 74 strains of mice and (ii) data collected from over 140 strains of laboratory mice. Searches can be made by chromosome region, nearby gene annotations or SNP identifiers among about 9 686 537 distinct SNPs.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).