Provides aligned and annotated ribosomal RNA (rRNA) gene sequence data source, along with tools to allow researchers to analyze their own rRNA gene sequences. RDP offers tools for browsing and searching the data collections, for taxonomic classification and nearest neighbor search, for primer-probe testing and for tree building. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics.
A manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. To facilitate the identification of complementary probe sets for organisms represented by short rRNA sequence reads generated by amplicon sequencing or metagenomic analysis with next generation sequencing technologies such as Illumina and IonTorrent, we introduce a novel tool that recovers surrogate near full-length rRNA sequences for short query sequences and finds matching oligonucleotides in probeBase.
Provides up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. silva contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Its Browser implements a hierarchical view on the database contents, similar to a file browser, visualizing any of the taxonomies included with SILVA.
A database dedicated to nucleotide sequences of bacterial, archaeal and eukaryotic (cytoplasmic and organellar) 5S ribosomal RNAs and their genes. The sequences for particular organisms can be retrieved as single files using a taxonomic browser or in multiple sequence structural alignments. All data that have been used to create the database can be downloaded from the web page and from interactive windows, and can be used in subsequent data-mining applications.
A curated collection of chaperonin sequence data collected from public databases or generated by a network of collaborators exploiting the cpn60 target in clinical, phylogenetic and microbial ecology studies.
A searchable database documenting variation in ribosomal RNA operons in Bacteria and Archaea. The redesigned rrnDB brings a substantial increase in the number of genomes described, improved curation, mapping of genomes to both NCBI and RDP taxonomies, and refined tools for querying and analyzing these data. With these changes, the rrnDB is better positioned to remain a comprehensive resource under the torrent of microbial genome sequencing.
Compiles ribosomal protein (RP) genes from various species. RPG aims to improve comparative studies of gene evolution. The database includes chromosomal positions, accession numbers, gene and CDS sizes, orthologs, snoRNAs and links to other public databases. Each record is linked to an orthologous gene classification table, except for human which are related to a gene table or a chromosomal map position.
Gathers information about Cryo-electron microscopy (cryo-EM) density maps and atomic coordinates of ribosomal particles from Protein Data Bank (PDBs) and Electron Microscopy Data Bank (EMDB). Users can make searches by EMDB or PDB accession number (ID) or use the drop-down menu to restrict the search to Source Database or ID, title, author, abstract or PubMed ID, as well as method, organism, ligand, classification or particle type
Explores, evaluates and monitors the diversity of photosynthetic eukaryotes in aquatic and terrestrial ecosystems. PhytoREF is a database built through the compilation of all of the publicly available plastidial 16S rDNA sequences (amplicons and sequences extracted from plastidial genomes), as well as novel Sanger amplicons. This database could also facilitate the development of a range of applications in biomonitoring photosynthetic eukaryotes in various habitats, palaeoecological studies of primary producers in past environments and dietary studies in unicellular and multicellular herbivores.
Gathers information about the morphology, ecophysiology, abundance and distribution of genus members in full-scale treatment systems with phylotype identity. MiDAS was primarily a taxonomic database curated for abundant and process important phylotypes for activated sludge wastewater treatment systems with biological nutrient removal. The repository also includes the organisms of the anaerobic digestion community and the most abundant influent wastewater organisms.
An open-access and curated reference barcoding database for diatoms, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research). R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). In addition to these information, morphological features (e.g. biovolumes, chloroplasts...), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom.
Helps researchers to explore publicly available sequences harvested from GenBank and assigned to a specific collection of gene families. FGR employs a reference/model-based, comparative analysis strategy to build the reference database and helps to study functional and phylogenetic diversities of specific gene families. This strategy relies on the use of HMMER3 and Hidden Markov Models (HMM). The FGR currently contains 77 gene families organized into seven categories: Antibiotic resistance, Biodegradation, Biogeochemical Cycles, Metal Cycling, Phylogenetic Markers, Plant Pathogenicity, and “Other” for gene families not in the listed categories.
Collects 16S rRNA sequences from a large number of datasets. MetaMetaDB is a comprehensive (‘‘meta-’’) and compact database that contains collection of 16S rRNA sequences associated with diverse environments. Users can submit the 16S rRNA sequences of certain prokaryotes and thus can investigate the microbial habitability for analyzing the ecology and evolution of prokaryotes. The database provides a reverse perspective of the environments in which each prokaryotic group exists, opening the door to the investigation of ‘‘meta-metagenomics’’.
Permits to explore multiple datasets generated by 16S rRNA gene amplicon high-throughput sequencing (HTS) studies of food bacterial communities. FoodMicrobionet is a database and visualisation tool based on network analysis. This online resource can also be used to obtain further information on distribution of taxa in different food groups by filtering and recalculation from nodes and edges tables. It allows researchers in the food microbiology to benefit from the significant advances that HTS is providing in this key field of research.
Allows users to detect contaminated prokaryotic genomes. ContEst16S is an online repository that permits researchers to consult information about contaminated genomes. In this database, 16S rRNA gene fragments from the query genome assemblies are screened to see if the genome assembly is contaminated or not. Its online interface notably provides access to several Genbank data.
Provides data and analytics portal focuses on taxonomy, ecology, genomics and metagenomics. EZioCloud is an integrated database with a complete taxonomic hierarchy of the Bacteria and Archaea represented by 16S rRNA gene and genome sequences. All genomes were identified taxonomically at the kind, species or subspecies levels using a combination of gene-based search.
Describes information linked to oral microbe species. HOMD is a body site-specific public database that provides the scientific community with information on prokaryote species which are present in the human oral cavity. The database also includes BLAST tools for identifying unknown isolates or clones based on their 16S rRNA sequence, as well as phenotypic, bibliographic, clinical and genomic information for each taxa. It can serve as a model for microbiome data from other human body sites.
Gathers annotated, chimera-checked, full-length 16S rRNA gene sequences in standard alignment formats. Greengenes distributes relationships of taxonomies from multiple curators and multiple sequences from a single study. It can serve to assess the validity of prokaryotic candidate phyla. This database was constructed with more than 90 000 public 16S small-subunit rRNA gene sequences aligned and chimera checked.
Offers plankton symbiosis & Radiolaria database. The Renkan application is used by the planktonic symbiosis research team at the Station Biologique de Roscoff to organize and share information on Radiolarians specimens available in the lab. Samples are composed mainly of single cells collected and isolated worldwide before to be processed in the lab. Information on collection procedures, location, morphology and molecular markers are provided for each specimen.
Collects a list of reported post-transcriptionally modified nucleosides and sequence sites in small sub-unit rRNAs from bacteria, eukarya and archaea. The Small Subunit rRNA Modification Database is divided in four main parts: (1) the overview part offers a description of the repository, sources of modification data and comments; (2) the map part displays modifications for selected organisms; (3) the align part provides modification-annotated aligned sequences for comparison and (4) the browse part allows users to access to two search functions for specific modifications of interest and retrieval of literature citations.
An integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches.
Developed with the goal of having a one-stop genomic resource platform for the scientific community to access, retrieve, download, browse, search, visualize and analyse the staphylococcal genomic data and annotations.
Supplies a collection of 16S rRNA gene sequences from ruminal methanogens and from various other intestinal environments where methanogens are known to be important hydrogen consumers. It is composed of primarily sequences longer than 1,200 bp that cover large parts of the almost 1,540 bp long gene. The database includes shorter sequences if they originated from isolates or enrichment cultures. It is suitable for the analysis of amplicon data that have been generated for variable regions other than the V6–V8 region.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).