Allows quantification and visualization of CRISPR-Cas9 outcomes, as well as evaluation of effects on coding sequences, noncoding elements and selected off-target sites. CRISPResso is a suite of computational tools that offers several features, including batch sample analysis via command line interface, integration with other pipelines, tunable parameters of sequence quality and alignment fidelity, discrete measurement of insertions, deletions, and nucleotide substitutions, and distinction between non-homologous end joining (NHEJ), homology-directed repair (HDR), and mixed mutation events.
Investigates clustered regularly interspaced short palindromic repeats (CRISPRs) from genomic sequences. CRISPRFinder recognizes CRISPR arrays using CRISPR repeats. It simplifies the assessment of tentative CRISPR and cas loci. This tool can characterize conserved regions to single base pair resolution. It enables the extraction of flanking sequences or spacers and through a desktop version as well as a web application.
Proposes a set of bioinformatic tools assisting biologists in the development and the setting up of a CRISPR genotyping scheme. In the pre-processing phase, the comparison of CRISPRs is mandatory and may be fulfilled using the CRISPRcomparison tool, which helps in selecting the most appropriate CRISPR loci and associated primers for the PCR amplification. CRISPRcomparison allows the identification of families of strains that share a CRISPR, inside species with high genetic diversity or the identification of homologous CRISPRs within species containing multiple CRISPR loci. In the post-processing phase, the CRISPRtionary program is very interesting since it allows the user to easily compare multiple alleles of a CRISPR locus investigated in a collection of strains and to obtain pre-calculated files that may be directly used in clustering analysis.
Serves for similarity search-based prediction. CRISPRAlign is a program that allows users to identify clustered regularly interspaced short palindromic repeats (CRISPRs) in a target sequence (a genome or a contig) that has repeats similar to a given CRISPR (query CRISPR). It performs by detecting substrings in the target sequence (or its reverse complement) that are similar to the repeat sequence of a query CRISPR.
Predicts the strand of the resulting crRNAs. The method uses as input CRISPR repeat predictions. CRISPRDirection uses parameters that are calculated from the CRISPR repeat predictions and flanking sequences, which are combined by weighted voting. The prediction may use prior coding sequence annotation but this is not required. CRISPRDirection correctly predicted the orientation of 94% of a reference set of arrays.
Assists users to determine clustered regularly interspaced short palindromic repeats (CRISPR) leader boundaries. CRISPRleader performs by focusing on leader sequence conservation within groupings based on the similarity of the repeats in the adjacent CRISPR arrays. It supplies annotation of the CRISPR array, its strand orientation as well as conserved core leader boundaries that can be uploaded to any genome browser.
Provides annotation of CRISPR—Cas systems including (i) CRISPR arrays of repeat-spacer units, and cas genes, (ii) type (and subtype) of predicted system(s) and (iii) anti-repeats (part of tracrRNA genes in type II CRISPR–Cas systems). The CRISPRone website also provides online prediction of CRISPR–Cas systems given genomic sequences, using a pipeline with integrated checking of false-CRISPRs. It can be used to submit sequences to the server for prediction, look up pre-calculated CRISPR-Cas systems or check out mock CRISPRs (elements that superficially reassemble CRISPRs).
Aims to identify target specific guide RNAs for CRISPR-Cas9 genome-editing systems. CRISPRseek is a program that simplifies design of target specific guide RNA (gRNA) for any CRISPR-Cas9 system with a characterized PAM sequence. It also includes functionalities for score target sites in two related input sequences in order to identify gRNAs.
Provides a quick and detailed insight into repeat conservation and diversity of both bacterial and archaeal systems. CRISPRmap comprises the largest dataset of CRISPRs to date and enables comprehensive independent clustering analyses to determine conserved sequence families, potential structure motifs for endoribonucleases, and evolutionary relationships.
Allows deconvolution of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) screen data. CRISPRcloud is an analysis platform that enables users to confidentially deposit raw sequencing files, extract and cluster customizable statistical analysis from a cloud-based system. Moreover, the software can generate and prioritize hit lists and export datasets for downstream validation and output various end-point results in a personalized manner.
Predicts the most likely targets of CRISPR RNAs. This can be used to discover targets in newly sequenced genomic or metagenomic data. The inputs into CRISPRTarget are predicted CRISPR arrays or spacer sequences. The output provided is either visual in HTML format, but can also be saved as text and opened in a spreadsheet. The target sequence is typically displayed as an R-loop, depicting a specified part of the crRNA, as well as both the target and non-target strand of the double-stranded target DNA. The target sequence R-loop can be fully reverse complemented, when users suspect that the direction of transcription of the CRISPR array starts from the downstream end instead.
Resolves and localizes individual mutant alleles with respect to the endonuclease cut site. CrispRVariants quantifies and visualizes individual variant alleles from either traditional Sanger sequencing or high-throughput CRISPR-Cas9 mutagenesis sequencing experiments. CrispRVariants was designed with interactivity in mind, explicitly allowing users to detect problems and filter sequences appropriately before estimating mutation efficiency. This toolkit can be easily used to create a variant allele summary plot and accompanying table of counts. CrispRVariants enables immediate comparison of variant spectra between target locations.
Provides a de novo clustered regularly interspaced short palindromic repeats (CRISPRs) annotation program. CRISPRdigger focuses on detecting weak Direct Repeats (DRs) signals. It uses RepeatScout for the de novo screening of repeats, and searches for the consecutively distributed repeat copies detected by RepeatMasker. As all the parameters had default values, a user could annotate CRISPRs in a query genome by supplying only a genome sequence in the FASTA format.
Serves as a calculation and visualization tool for high-throughput CRISPR genome-editing data analysis. CRISPRMatch integrates analysis steps like mapping reads, measuring mutation frequency (deletion and insertion), evaluating accuracy and efficiency of genome-editing systems and outputting visualization of tables and figures. This software suits for genome-editing data of CRISPR nuclease transformed protoplasts that could assess the targeted mutation efficiency of DNA endonucleases and regions of guide RNAs.
Consists of a web server for selecting rational CRISPR/Cas targets from an input sequence. CRISPRdirect enables selection of rational target sequences with minimized off-target sites by performing exhaustive searches against genomic sequences. It is able to investigate the entire genome for perfect matches with each candidate target sequence. Users can also browse the detailed list of potential off-target sites that have partial complementarity with the selected sequence.
Permits users to predict on-target activity of in-silico sgRNAs efficiently based on the applications of Support Vector Machine (SVM) model. This tool provides two key factors that improve the in-silico prediction of single-guide RNA (sgRNA) activity in CRISPR/Cas9 system. In first, all possible single, di-nucleotides, tri-nucleotides and tetra-nucleotides position specific features and position independent features are incorporates. Secondly, active sgRNA is enriched with “A” but is “T” depleted.
Allows identification and visualization of CRISPR loci. CRISPRviz detects and extracts repeats and spacers and enables data via a local web server for additional manipulation. This software contains two main components: an extraction pipeline/conversion engine and a web-based front-end. It facilitates swift implementation and can serve as an epidemiological tool by enhancing tracking of micro-evolution in diverging pathogenic strains and as a genomic tool for phylogenetic reconstruction.
Provides a generalized and simplified approach to generate null alleles in mouse using CRISPR/ Cas9 by designing deletions that remove internal coding regions or critical exons. CRISPRtools is a platform that supports alternative sgRNA scoring options and can be easily extended to other organisms. It supports two general design strategies depending on the gene structure: (i) internal exon deletions and (ii) whole exon deletions.
Assists users in obtaining mechanistic insights into genetic dependencies. CRISPRO is a computational pipeline that was developed to elucidate functional residues and predict phenotypic outcome of genome editing. It uses CRISPR tiling screens, protein and nucleotide sequence level annotations, and 3D visualization of protein structure. It can also be used for calculation of functional scores per guide RNA by using next generation sequencing (NGS) data as input.
Serves for the processing of pooled genome-wide. CRISPRcleanR allows users to identify biased genomic regions from CRISPR-KO screen datasets. It also corrects both read count and log fold change (logFC) of individual single guide RNA (sgRNA) in such regions. It can be used for reducing false positive calls while keeping the true positive rate of known essential genes largely unchanged. Moreover, it assists in detecting essential genes, even within focally amplified regions.
Utilizes the CRISPRFinder program to identify putative CRISPRs and additional tests to further screen for the smallest CRISPRs in a polyphasic approach. Indeed the CRISPRFinder program is conceived to authorize the largest number of possible CRISPRs, especially the shortest ones, containing one or two spacers. The main idea of the program is to first find possible CRISPR localizations in a genomic sequence and then check if these regions contain a cluster that possess the characteristics of "obvious" CRISPR, i.e. containing at least three repeats.
A database of CRISPR/Cas9 target sequences that have been experimentally validated in zebrafish. CRISPRz can be searched using multiple inputs such as ZFIN IDs, accession number, UniGene ID, or gene symbols from zebrafish, human and mouse. CRISPRz was developed in an effort to provide a comprehensive list of validated CRISPR targets from published sources as well as from an ongoing genome-wide knockout project in the zebrafish genome. Data will be added as more validated CRISPR targets are published or contributed from unpublished, in-house projects. The database is also open for data submission from the research community.
Provides a single platform to integrate the growing information being generated by a genome editing approach. CrisprGE is an online database that contains over 4680 genes edited by CRISPR/Cas approach. It also includes more than 220 unique genes targeted in about 30 models and other organisms along with different modification induced by repair mechanisms.
Offers a collection of information and links for scientists interested by the utilization of targetable clustered regularly interspaced short palindromic repeats (CRISPR)/Cas systems for genome engineering and other applications.
Offers Pooled In vitro CRISPR Knockout Library Essentiality Screens. PICKLES allows exploration of gene essentiality profiles of users’ favourite genes across a large set of CRISPR knockout and shRNA knockdown fitness screens, mostly in cancer cell lines. It can display how gene-specific essentiality varies across tissue types and, in many cases, the relationship with gene expression levels in the same cells.
Allows users to search from over 100,000 genomes and 9,000 species in order to make a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) guide for gene knockout. CRISPR Knockout Guide Designer permits to visualize recommendations for knockout with less off-target effects. It’s possible to validate guides created with other tools or consult the locations of sequence within the gene.
Aims to facilitate users research using CRISPR technology. ‘Genome-wide gRNA databases for CRISPR genome editing and transcription activation’ provides genome-wide databases containing pre-validated gRNA sequences. It contains pre-validated gRNA sequences targeting genes in the human and in the mouse genome. It includes 2 resources: a gRNA database in which SpCas9 gRNA sequences are targeted to constitutive exons and designed for minimal off-target effects, and a SAM database that targets the first 200bp upstream of each transcription start site.
Uses methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks.
Provides a universal CRISPR annotation system. grID is an extensive compilation of gRNA properties including sequence and variations, thermodynamic parameters, off-target analyses, and alternative PAM sites, among others. The database is designed to keep up with the rapidly evolving CRISPR technology. Users can search in the database by NCBI reference sequence ID, Gene Symbol or any valid 23-bp targeting sequence in the form N20NGG.
A database for high-throughput CRISPR/Cas9 screening experiments. GenomeCRISPR contains data on the performance of more than 550 000 single guide RNAs (sgRNAs) which were used in >80 different experiments performed in 48 different human cell lines. It provides several data mining options and tools allowing users to easily investigate and compare the results of different screens. An API can be used for automated data access.
Allows rapid identification of sgRNA target sequences in the Chinese hamster ovary (CHO-K1) genome. The CRISPy tool identified 1,970,449 CRISPR targets divided into 27,553 genes and lists the number of off-target sites in the genome. The proven functionality of Cas9 to edit CHO genomes combined with the CRISPy database have the potential to accelerate genome editing and synthetic biology efforts in CHO cells.
Provides a platform to create guide RNAs (gRNAs). Cpf1-Database is a web application that allows users to select the targeted genes and select the optimized gRNAs through a graphic interface with flexible filtering parameters. Users can perform its editing from a repository of determined targets of Cpf1 endonucleases comprising 5’-TTTN-3’ PAM sequences in all coding sequence (CDS) regions in the complete genome of 12 organisms.
Provides access to data about anti-CRISPR proteins. Anti-CRISPRdb is an online resource that contains more than 400 anti-CRISPR proteins tested by experimental and bioinformatics methods. The database allows users to browse, search, blast, screen, and download data on their anti-CRISPR proteins of interest, as well as sharing data on validated/potential anti-CRISPR proteins with other related scientific communities.
Facilitates the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among other functions, the Grape-CRISPR database allows users to identify and select multi-protospacers for editing similar sequences in grape genomes simultaneously. The database contains two main sections: Search and Design. In the Search section, users can identify appropriate protospacer and protospacer-adjacent motif (PAM) sites of a gene by providing certain inquiry information such as locus location, gene ID or Pfam ID. The Design section is for protospacer design. Users can detect and design protospacers and PAMs in the sequences of interest by using the Perl scripts provided.
Focuses on integrating experimentally and computationally identified super-enhancers and annotating their potential roles in the regulation of cell identity gene expression in a cell type-specific manner. The current release of SEA incorporates 83 996 super-enhancers computationally or experimentally identified in 134 cell types/tissues/diseases, including human (75 439, three of which were experimentally identified), mouse (5879, five of which were experimentally identified), Drosophila melanogaster (1774) and Caenorhabditis elegans (904). To facilitate data extraction, SEA supports multiple search options, including species, genome location, gene name, cell type/tissue and super-enhancer name. The response provides detailed (epi)genetic information, incorporating cell type specificity, nearby genes, transcriptional factor binding sites, CRISPR/Cas9 target sites, evolutionary conservation, SNPs, H3K27ac, DNA methylation, gene expression and TF ChIP-seq data. Moreover, analytical tools and a genome browser were developed for users to explore super-enhancers and their roles in defining cell identity and disease processes in depth.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).