Allows quantification and visualization of CRISPR-Cas9 outcomes, as well as evaluation of effects on coding sequences, noncoding elements and selected off-target sites. CRISPResso is a suite of computational tools that offers several features, including batch sample analysis via command line interface, integration with other pipelines, tunable parameters of sequence quality and alignment fidelity, discrete measurement of insertions, deletions, and nucleotide substitutions, and distinction between non-homologous end joining (NHEJ), homology-directed repair (HDR), and mixed mutation events.
Proposes a set of bioinformatic tools assisting biologists in the development and the setting up of a CRISPR genotyping scheme. In the pre-processing phase, the comparison of CRISPRs is mandatory and may be fulfilled using the CRISPRcomparison tool, which helps in selecting the most appropriate CRISPR loci and associated primers for the PCR amplification. CRISPRcomparison allows the identification of families of strains that share a CRISPR, inside species with high genetic diversity or the identification of homologous CRISPRs within species containing multiple CRISPR loci. In the post-processing phase, the CRISPRtionary program is very interesting since it allows the user to easily compare multiple alleles of a CRISPR locus investigated in a collection of strains and to obtain pre-calculated files that may be directly used in clustering analysis.
Predicts the strand of the resulting crRNAs. The method uses as input CRISPR repeat predictions. CRISPRDirection uses parameters that are calculated from the CRISPR repeat predictions and flanking sequences, which are combined by weighted voting. The prediction may use prior coding sequence annotation but this is not required. CRISPRDirection correctly predicted the orientation of 94% of a reference set of arrays.
An interface to extract with precision and to further analyse clustered regularly interspaced short palindromic repeats (CRISPRs) from genomic sequences. Four main advantages may be cited: (i) short CRISPR-like structures are detected, they are labelled questionable but may be of great interest if later confirmed; (ii) conserved regions are accurately defined to single base pair resolution; (iii) summary files may be uploaded (CRISPR properties summary and spacers file in Fasta format) and (iv) flanking sequences or spacers can be easily extracted and blasted against different databases.
A software tool for identifying CRISPRs in a target sequence (a genome or a contig) that has repeats similar to a given CRISPR. CRISPRAlign works by first detecting substrings in the target sequence (or its reverse complement) that are similar to the repeat sequence of a query CRISPR, and then checking for other requirements, as in metaCRT.
Provides annotation of CRISPR—Cas systems including (i) CRISPR arrays of repeat-spacer units, and cas genes, (ii) type (and subtype) of predicted system(s) and (iii) anti-repeats (part of tracrRNA genes in type II CRISPR–Cas systems). The CRISPRone website also provides online prediction of CRISPR–Cas systems given genomic sequences, using a pipeline with integrated checking of false-CRISPRs. It can be used to submit sequences to the server for prediction, look up pre-calculated CRISPR-Cas systems or check out mock CRISPRs (elements that superficially reassemble CRISPRs).
An efficient approach to determining CRISPR leader boundaries by focusing on leader sequence conservation within groupings based on the similarity of the repeats in the adjacent CRISPR arrays. CRISPRleader utilizes a string-kernel technique that can capture more information than traditional sequence alignments and is especially capable of detecting a collection of local motifs.
Provides a quick and detailed insight into repeat conservation and diversity of both bacterial and archaeal systems. CRISPRmap comprises the largest dataset of CRISPRs to date and enables comprehensive independent clustering analyses to determine conserved sequence families, potential structure motifs for endoribonucleases, and evolutionary relationships.
Allows deconvolution of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) screen data. CRISPRcloud is an analysis platform that enables users to confidentially deposit raw sequencing files, extract and cluster customizable statistical analysis from a cloud-based system. Moreover, the software can generate and prioritize hit lists and export datasets for downstream validation and output various end-point results in a personalized manner.
A central hub of CRISPR/Cas-based genome editing. Presently, this database holds a total of 4680 entries of 223 unique genes from 32 model and other organisms. It encompasses information about the organism, gene, target gene sequences, genetic modification, modifications length, genome editing efficiency, cell line, assay, etc. This depository is developed using the open source LAMP (Linux Apache MYSQL PHP) server. User-friendly browsing, searching facility is integrated for easy data retrieval. It also includes useful tools like BLAST CrisprGE, BLAST NTdb and CRISPR Mapper.
Predicts the most likely targets of CRISPR RNAs. This can be used to discover targets in newly sequenced genomic or metagenomic data. The inputs into CRISPRTarget are predicted CRISPR arrays or spacer sequences. The output provided is either visual in HTML format, but can also be saved as text and opened in a spreadsheet. The target sequence is typically displayed as an R-loop, depicting a specified part of the crRNA, as well as both the target and non-target strand of the double-stranded target DNA. The target sequence R-loop can be fully reverse complemented, when users suspect that the direction of transcription of the CRISPR array starts from the downstream end instead.
Provides a de novo clustered regularly interspaced short palindromic repeats (CRISPRs) annotation program. CRISPRdigger focuses on detecting weak Direct Repeats (DRs) signals. It uses RepeatScout for the de novo screening of repeats, and searches for the consecutively distributed repeat copies detected by RepeatMasker. As all the parameters had default values, a user could annotate CRISPRs in a query genome by supplying only a genome sequence in the FASTA format.
A simple and functional web server for selecting rational CRISPR/Cas targets from an input sequence. The CRISPR/Cas system is a promising technique for genome engineering which allows target-specific cleavage of genomic DNA guided by Cas9 nuclease in complex with a guide RNA (gRNA), that complementarily binds to a ~20 nt targeted sequence. The target sequence requirements are twofold. First, the 5'-NGG protospacer adjacent motif (PAM) sequence must be located adjacent to the target sequence. Second, the target sequence should be specific within the entire genome in order to avoid off-target editing. CRISPRdirect enables users to easily select rational target sequences with minimized off-target sites by performing exhaustive searches against genomic sequences. The server currently incorporates the genomic sequences of human, mouse, rat, marmoset, pig, chicken, frog, zebrafish, Ciona, fruit fly, silkworm, C. elegans, Arabidopsis, rice, Sorghum, and budding yeast.
Resolves and localizes individual mutant alleles with respect to the endonuclease cut site. CrispRVariants quantifies and visualizes individual variant alleles from either traditional Sanger sequencing or high-throughput CRISPR-Cas9 mutagenesis sequencing experiments. CrispRVariants was designed with interactivity in mind, explicitly allowing users to detect problems and filter sequences appropriately before estimating mutation efficiency. This toolkit can be easily used to create a variant allele summary plot and accompanying table of counts. CrispRVariants enables immediate comparison of variant spectra between target locations.
A highly flexible, open source software package to identify gRNAs that target a given input sequence while minimizing off-target cleavage at other sites within any selected genome. CRISPRseek will identify potential gRNAs that target a sequence of interest for CRISPR-Cas9 systems from different bacterial species and generate a cleavage score for potential off-target sequences utilizing published or user-supplied weight matrices with position-specific mismatch penalty scores. Identified gRNAs may be further filtered to only include those that occur in paired orientations for increased specificity and/or those that overlap restriction enzyme sites.
Permits users to predict on-target activity of in-silico sgRNAs efficiently based on the applications of Support Vector Machine (SVM) model. This tool provides two key factors that improve the in-silico prediction of single-guide RNA (sgRNA) activity in CRISPR/Cas9 system. In first, all possible single, di-nucleotides, tri-nucleotides and tetra-nucleotides position specific features and position independent features are incorporates. Secondly, active sgRNA is enriched with “A” but is “T” depleted.
Provides a generalized and simplified approach to generate null alleles in mouse using CRISPR/ Cas9 by designing deletions that remove internal coding regions or critical exons. CRISPRtools is a platform that supports alternative sgRNA scoring options and can be easily extended to other organisms. It supports two general design strategies depending on the gene structure: (i) internal exon deletions and (ii) whole exon deletions.
Assists users in obtaining mechanistic insights into genetic dependencies. CRISPRO is a computational pipeline that was developed to elucidate functional residues and predict phenotypic outcome of genome editing. It uses CRISPR tiling screens, protein and nucleotide sequence level annotations, and 3D visualization of protein structure. It can also be used for calculation of functional scores per guide RNA by using next generation sequencing (NGS) data as input.
Serves for the processing of pooled genome-wide. CRISPRcleanR allows users to identify biased genomic regions from CRISPR-KO screen datasets. It also corrects both read count and log fold change (logFC) of individual single guide RNA (sgRNA) in such regions. It can be used for reducing false positive calls while keeping the true positive rate of known essential genes largely unchanged. Moreover, it assists in detecting essential genes, even within focally amplified regions.
A web-based and command line tool, that enables accurate identification of CRISPR arrays in genomes, their direction, repeat spacer boundaries, substitutions, insertions or deletions in repeats and spacers and lists cas genes that are annotated in the genome. This data is combined into a searchable database, CRISPRBank, currently version 1.0. Spacer outputs from CRISPRDetect can then be directly used to search for targets in viral and other sequence databases using the linked tool, CRISPRTarget. CRISPRDetect enables more accurate detection of arrays and spacers and its gff output is suitable for inclusion in genome annotation pipelines and visualisation. It has been used to analyse all complete bacterial and archaeal reference genomes.
Utilizes the CRISPRFinder program to identify putative CRISPRs and additional tests to further screen for the smallest CRISPRs in a polyphasic approach. Indeed the CRISPRFinder program is conceived to authorize the largest number of possible CRISPRs, especially the shortest ones, containing one or two spacers. The main idea of the program is to first find possible CRISPR localizations in a genomic sequence and then check if these regions contain a cluster that possess the characteristics of "obvious" CRISPR, i.e. containing at least three repeats.
A database of CRISPR/Cas9 target sequences that have been experimentally validated in zebrafish. CRISPRz can be searched using multiple inputs such as ZFIN IDs, accession number, UniGene ID, or gene symbols from zebrafish, human and mouse. CRISPRz was developed in an effort to provide a comprehensive list of validated CRISPR targets from published sources as well as from an ongoing genome-wide knockout project in the zebrafish genome. Data will be added as more validated CRISPR targets are published or contributed from unpublished, in-house projects. The database is also open for data submission from the research community.
Allows users to search from over 100,000 genomes and 9,000 species in order to make a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) guide for gene knockout. CRISPR Knockout Guide Designer permits to visualize recommendations for knockout with less off-target effects. It’s possible to validate guides created with other tools or consult the locations of sequence within the gene.
Offers Pooled In vitro CRISPR Knockout Library Essentiality Screens. PICKLES allows exploration of gene essentiality profiles of users’ favourite genes across a large set of CRISPR knockout and shRNA knockdown fitness screens, mostly in cancer cell lines. It can display how gene-specific essentiality varies across tissue types and, in many cases, the relationship with gene expression levels in the same cells.
Uses methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks.
Provides a universal CRISPR annotation system. grID is an extensive compilation of gRNA properties including sequence and variations, thermodynamic parameters, off-target analyses, and alternative PAM sites, among others. The database is designed to keep up with the rapidly evolving CRISPR technology. Users can search in the database by NCBI reference sequence ID, Gene Symbol or any valid 23-bp targeting sequence in the form N20NGG.
A database for high-throughput CRISPR/Cas9 screening experiments. GenomeCRISPR contains data on the performance of more than 550 000 single guide RNAs (sgRNAs) which were used in >80 different experiments performed in 48 different human cell lines. It provides several data mining options and tools allowing users to easily investigate and compare the results of different screens. An API can be used for automated data access.
Allows rapid identification of sgRNA target sequences in the Chinese hamster ovary (CHO-K1) genome. The CRISPy tool identified 1,970,449 CRISPR targets divided into 27,553 genes and lists the number of off-target sites in the genome. The proven functionality of Cas9 to edit CHO genomes combined with the CRISPy database have the potential to accelerate genome editing and synthetic biology efforts in CHO cells.
A database for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. IMG/M includes archaea, bacteria, eukarya, plasmids, viruses, genome fragments (partially sequenced genomes), as well as metagenomes and metatranscriptome datasets. IMG performs feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements.
Provides a platform to create guide RNAs (gRNAs). Cpf1-Database is a web application that allows users to select the targeted genes and select the optimized gRNAs through a graphic interface with flexible filtering parameters. Users can perform its editing from a repository of determined targets of Cpf1 endonucleases comprising 5’-TTTN-3’ PAM sequences in all coding sequence (CDS) regions in the complete genome of 12 organisms.
Provides access to data about anti-CRISPR proteins. Anti-CRISPRdb is an online resource that contains more than 400 anti-CRISPR proteins tested by experimental and bioinformatics methods. The database allows users to browse, search, blast, screen, and download data on their anti-CRISPR proteins of interest, as well as sharing data on validated/potential anti-CRISPR proteins with other related scientific communities.
Facilitates the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among other functions, the Grape-CRISPR database allows users to identify and select multi-protospacers for editing similar sequences in grape genomes simultaneously. The database contains two main sections: Search and Design. In the Search section, users can identify appropriate protospacer and protospacer-adjacent motif (PAM) sites of a gene by providing certain inquiry information such as locus location, gene ID or Pfam ID. The Design section is for protospacer design. Users can detect and design protospacers and PAMs in the sequences of interest by using the Perl scripts provided.
Focuses on integrating experimentally and computationally identified super-enhancers and annotating their potential roles in the regulation of cell identity gene expression in a cell type-specific manner. The current release of SEA incorporates 83 996 super-enhancers computationally or experimentally identified in 134 cell types/tissues/diseases, including human (75 439, three of which were experimentally identified), mouse (5879, five of which were experimentally identified), Drosophila melanogaster (1774) and Caenorhabditis elegans (904). To facilitate data extraction, SEA supports multiple search options, including species, genome location, gene name, cell type/tissue and super-enhancer name. The response provides detailed (epi)genetic information, incorporating cell type specificity, nearby genes, transcriptional factor binding sites, CRISPR/Cas9 target sites, evolutionary conservation, SNPs, H3K27ac, DNA methylation, gene expression and TF ChIP-seq data. Moreover, analytical tools and a genome browser were developed for users to explore super-enhancers and their roles in defining cell identity and disease processes in depth.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).