Next-generation sequencing experiments have generated large amounts of data that are collected in public archives (primary databases). Secondary data extracted from these primary databases are available in the gene expression and disease/disorder sections.
Provides high-throughput microarray and next-generation sequence (NGS) functional genomic data sets. GEO archives raw data, processed data and metadata submitted by the research community. Its data are indexed, cross-linked and searchable. This database gives access to several tools and graphical renderings allowing users to easily explore and interpret data available on the platform. It can be useful to develop and test new hypotheses.
Offers a repository for related with nucleotide sequencing workflows. ENA provides data model containing input information, output machine data and interpreted information. The database gathers a wide range of information as well as raw sequence data and derived data, including sequences, assemblies and functional annotation accompanied by studies and samples, to provide experimental context.
Provides a database of functional genomics experiments. ArrayExpress includes data generated by sequencing or array-based technologies. This resource integrates the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Stores raw sequence data from next-generation sequencing (NGS) technologies. SRA is a database which works as a core infrastructure for sharing of pre-publication sequence data, with the aim to make sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The database also stores alignment information in the form of read placements on a reference sequence.
A database at EBI that stores information about biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. The goals of the BioSample Database include: (i) recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; (ii) minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and (iii) supporting cross database queries by sample characteristics. Each sample in the database is assigned an accession number. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers.
Gathers information about raw-level personal genomic data from whole genome next generation sequencing (NGS) and ultra-high-resolution comparative genomic hybridization (CGH) arrays. TIARA enables detection of personal genomic variations, such as single nucleotide polymorphisms (SNPs), short insertions/deletions (indels), and structural variants (SVs). Moreover, the database also provides deposits of sequencing reads for 13 whole genomes and 16 transcriptomes at high depth of coverage from high-throughput sequencing machines including the Illumina Genome Analyzer and AB SOLiD.
Congregates computational neuroscience models in any simulator format or programming language. ModelDB provides a free platform, both public and private, which allows users to submit the source code form associated with a published model. The repository includes over 1000 models displayed into more than 130 topics. The database includes several functionalities, such as ModelView to examine run-time morphology, channel types, and values of parameters, or an advanced search mode allowing searching according to sixteen parameters.
Facilitates the capture and management of structured metadata and data for diverse biological research projects. BioProject constitutes an online resource to describe a project’s scope and objectives. The metadata collected includes structured information about the type of research investigation as genome sequencing and gene expression, project title and goals, the submitting group, environmental sample label or organism and some project data type attributes regarding the sample scope, target and method.
Facilitates the capture and management of structured metadata and data for diverse biological research samples. BioSample provides a dedicated area that presents collected project metadata including: the project data type as genome sequencing, transcriptome or gene expression; attributes concerning the sample scope and target, method, and project goals, submitting group, title, organism name or environmental sample label and brief description.
A database of conformational and thermodynamic dinucleotide properties. The Dinucleotide Property Database is designed to collect and analyse thermodynamic, structural and other dinucleotide properties. The table presenting all the dinucleotide properties can be pruned and rearranged by different criteria. The database contains different export and analysis functions.
Provides access to processed and curated NGS experiments, including ChIP-Seq (transcription factors and histones), RNA-Seq and DNase-Seq. The current focus of this database is to unify NGS data for the haematopoietic system and ES cells. It encompasses two specialized compendia: one focused on blood cells (HAEMCODE), and a second focused on data from embryonic stem (ES) cells (ESCODE).
Permits efficient searching of its database containing comprehensive information for all public RNA-seq data sets on mice with genotype as a factor. RNASeqMetaDB contains metadata for a total of 306 experiments targeting 298 different genes. Users can search the database using multiple parameters like genes, diseases, tissue types, keywords, and associated publications in order to find data sets that match their interests. Summary statistics of the metadata is also presented on the web server showing interesting global patterns of RNA-Seq studies.
Enables users to retrieve and process with minimal effort both primary and intrinsic feature data for existing sequences. SeqDepot is a database that assimilates known primary sequences with predicted feature data and provides simple and straightforward means to procure and readily use these data. This online resource contains more of 28.5 million unique amino acid sequences sourced from major sequence databases and more of 300 million intrinsic features.
Enables the integrative analysis of aggregated collections of tagged gene expression signatures identified and extracted from GEO. GEN3VA can be used to identify, aggregate, and analyze themed collections of gene expression signatures from diverse but related studies. It aggregates 244 studies from young and old tissues in mammalian systems.
Provides a data repository for archiving raw sequence data. GSA is built based on INSDC data standards, structures and provides data archival services for scientific communities. This resource accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata, and provides free and unrestricted access to all publicly available data for worldwide scientific communities.
View sequencing data generated on Illumina sequencers and analyzed in BaseSpace, the Illumina genomics computing environment. See how BaseSpace makes it easy to analyze your sequencing data and generate meaningful reports. View sample data sets and reports for a variety of applications, or test BaseSpace Apps on the sample data, and evaluate the results interactively.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).