Adapts the popular X!Tandem peptide search engine to work with Hadoop MapReduce for reliable parallel execution of large searches. MR-Tandem is designed to drop in wherever X!Tandem is already in use and requires no modification to existing X!Tandem parameter files, and only minimal modification to X!Tandem-based workflows. It runs on any Hadoop cluster but offers special support for Amazon Web Services for creating inexpensive on-demand Hadoop clusters.
Provides the bioinformatics community with annotated Web services descriptions in diverse formats. BioSWR is a web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional Web Service Definition Language (WSDL) based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language.
Helps researchers to isolate interesting papers from the ‘‘tsunami’’ of literature, in which, on average, more than one biomedical paper is added to MEDLINE every minute. The TogoDoc system is a server and a client literature management system specialized for the life science domain. The server and the client program cooperate closely over the Internet to provide life scientists with an effective literature recommendation service and efficient literature management.
Fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information.
Allows the maintenance of publication lists assigned to various topics. litDB allows researchers to store records of their own publications on a central server. It supports the calculation of impact factors for the papers in the system which is also helpful for raising funding for further research. The tool provides a table with statistics on the number of published papers and the impact factors for each year.
Eases the investigation of human protein atlas (HPA) data. HPAanalyze permits users to import data tables from HPA, export and display data, as well as download all staining images of interest. It provides a collection of complementary functionalities for hpar. This tool returns files where data can be subset by normal tissue types, normal cell types, cancer types and subcellular location.
Offers an intuitive way to read, manage and discover new literature. ReadCube offers features that permit to optimize PDF viewing, click on inline references, navigate to related articles, add inline notes and highlight text. The tool provides a figure browser, a full screen viewer and a customizable interface.
Gives access to research performance of about 8500 research institutions. SciVal facilitates new discoveries, collaboration and access to knowledge in order to find funding. It is a support platform that helps scientists, physicians, doctors, nurses throughout their careers. The tool permits to display research performance, benchmark and allows to analyse research trends.
Searches a DNA database for matches with a set of sequence-tagged sites (STS) primers. stssearch reports expected matches. For each pair of primers, it looks for matches between the primers and the query sequence in either orientation. Details of any matches are written to the output file. Only one primer need match for it to be reported. The tool is intended to be used by everyone and everything, from naive users to embedded scripts.
Allows the exploration of life sciences from big data point of view. SciCurve permits users to explore a subject in great detail. It is a systematic literature review method and offers interactive and comprehensible environment. The tool offers a way to visualize thematic relationships between publications and authors.
Supports scientific community and connects members as a part of a global research community. SJFinder offers a way to collect, extract and analyze data. It is able to suggest to user collaborators globally in seconds with an interactive Lab Finder, and share their work courtesy.
Automates search query in literature database. SciFeed is based on SciLit which contains more than 85 million of articles. It automatically sent a mail to user went a new article published correspond to his query.
Permits users to get access to a large number of research papers. Kopernio gives direct links to the full text articles without redirection or login step. It finds papers on university’s servers and pre-print servers. This tool permits to save PDFs for later and offers a way for students to access university library anywhere.
A semantic search engine to answer questions in the biomedical domain. GoWeb combines classical keyword-based web search with text-mining and ontologies to navigate large results sets and facilitate question answering. User submit a query through the search form, the server preprocesses the query and sends a search request to the search service. The search service returns the first results. The first results are then annotated, highlighted, rendered and sent to the user. Compare to traditional search engines, GoWeb bridges the semantic gap with the limited amount of available semantic annotations by employing text-mining for extraction of ontology concepts from text. In a nutshell, GoWeb exploits that keywords and ontology terms co-occurring in snippets are often facts.
Aims to collate any relevant data pertaining to any PE2-4 protein. Missing ProteinPedia permits to define, summarize and discuss all available data for the so-called missing proteins, emphasizing why they may be currently difficult to observe/find, using standard proteomics mass spectrometry (MS) and Ab-based techniques. It allows the generation of high confidence MS evidence for as many PE2-4 proteins as possible.
Provides information on over 190 published online cellular pathway and network databases. Pathguide is organised around approximate categories and a database can be stored in multiple categories if it contains multiple data types. The main categories for databases are: Protein–protein interaction (PPI), Metabolic pathway, Signaling pathway, Gene regulation network (GRN), Genetic pathway and Pathway diagram. The database includes a number of protein-sequence databases that store pathway information as secondary information.
Provides clinically relevant somatic mutations. DIRECT enables a genetically informed approach to cancer medicine by providing clinicians access to tumor gene therapy-response information based on individual patient data published in the literature. It catalogs only patients with non-small cell lung cancers (NSCLCs) and published individual patient data on EGF receptor (EGFR) mutations.
Allows exploration of biological databases utilizing semantic web methods. BioCarian is a search engine that browses several selected public databases. The databases consist in a sample collection of DNA-level data (dbSNP, GWAS, Ensembl), protein data (UniProt), pathway data (KEGG, Reactome), disease data (OMIM, DisGeNET) and a private database containing viral integration sites in the liver cancer patients. BioCarian can be used to do a primary analysis of data and act as means of an independent, orthogonal verification of an analysis result.
Supplies data concerning molecular biology. LinkDB is a collection of database links. It contains the original links provided by each database and the indirect and reverse links that are computed. This database works jointly to DBGET.
Assists with large-scale data-mining related to gene function in Caenorhabditis elegans. GExplore contains information about the domain organization of the proteomes of 9 nematode species, can display the location of Caenorhabditis elegans mutations with respect to the domain organization of the proteins, and includes stage-specific RNAseq gene expression data generated by the modENCODE project. The database has been organized to facilitate independent updates of the different parts of the database and to allow the addition of novel data sets in the future.
Provides information about molecular biology topic. DBGET is a database allowing to search and extract entries from a diverse range of databases. In this resource, a database is considered as a collection of entries which can be stored in a single file or multiple files. It is a conceptual extension of IDEAS (Integrated Database and Extended Analysis System) for nucleic acids and protein. This database works jointly to LinkDB.
Models biological diversity of scale insects through 300 years of published research. ScaleNet is a manually-curated, web-accessible database that manages information about the systematics, ecological associations (host plants, natural enemies and mutualists), geographic distributions, life histories, economic importance and morphology of each scale insect species. This resource is used heavily by insect identifiers as a diagnostic tool.
Displays an interactive map that represents the arXiv repository for scientific research papers. Paperscape represents each paper by a circle, with the size of a circle related to how often that paper has been cited.
Contains bibliographic information and abstracts of health-related articles, reports, surveys, and other resource documents pertaining to the health and health care of American Indians, Alaska Natives, and Canadian First Nations. The Native Health Database provides information for the benefit, use, and education of organizations and individuals with an interest in health-related issues, programs, and initiatives regarding North American indigenous peoples.
Provides a biomedical text-mining and relationship exploring knowledgebase. DESHCV is a comprehensive Hepatitis C Virus (HCV) online web resource. Users can retrieve information using specified concepts, keywords and phrases. It also offers option of either viewing the retrieved abstracts with or without embedded tagged concepts. It leads to the identification of possible new discovery and augments efforts in the search for diagnostic or even therapeutic targets.
Mines hundreds of millions of pages of complex, dense and unstructured life-sciences papers, and then uses Natural Language Processing and Machine Learning to clean, tag and structure the data. Bioz's platform incorporates semantic analysis engine and relevancy ranking of results, research-relevant correlations, reagent product ratings, and researcher collaboration optimization. By using this database, scientists are able to reduce the cost of reagents and equipment, shorten the amount of time required for successful experimentation, and most importantly, improve the quality of their research.
Scans daily updates of medical databases. PubCrawler offers a database that helps keeping scientists informed of the current contents of Medline and GenBank, by listing new database entries that match their research interests. PubCrawler searches the NCBI PubMed (Medline) and Entrez (GenBank) databases daily using search parameters (keywords, author names, etc.) specified by the user. It looks for matches to text (Medline and GenBank annotation), not sequences, and writes its output as a WWW page.
Offers the capability to integrate, aggregate, analyze and visualize biomedical data from a wide variety of structured and unstructured information repositories. Bio-In combines its extensive capabilities as a life science R&D informatics services provider. The platform allows data normalization and linking, customizable workflows, integrated data browser, flexible and scalable approach, and domain and technology expertise.
Regroups open access journals and articles. Paperity is multi-disciplinary and aims to facilitate access to literature. The goal of the platforms is to regroup integrality of open access literature. It permits to disseminate discoveries more effectively and maximize research impact.
Allows semantic searching from the public literature sources. XTractor provides association information with reference to various biomedical entities. The metadata is linked to more than 20 external databases and provides outputs for more than 13 million relationships.
Documents publicly available and experimentally determined polyanion-binding proteins (PABPs). DB-PABP is a database that provides information collected from original literature reports of experimentally verified polyanions/PABP interactions. The main search form generates tables containing specific information and hyperlinks. In addition to this main search form, this resource provides a set of utilities which allow string searches for author surnames.
An online tool to explore established and potential gene-disease associations in the context of other biomedical relations. Knowledge.bio provides access to a network of interconnected concepts mined from more than 20,000,000 PubMed abstracts. We integrate explicit connections identified by the SemRep system from the US National Library of Medicine with the Implicitome from the Leiden University Medical Center. The Implicitome provides predicted relationships between genes and diseases based on 'concept profile' technology.
Gathers information about stem cell research. SyStemCell is a manually-curated database which contains data extracted of more than 280 PubMed’ publications. It aims to provide a basic platform to extract relationships suggested by the multisource data. Moreover, it includes an analysis tool to mine potential relationships among different regulation levels and possibly formulate new hypothesis.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).