Computational protocol: Function of the CRISPR-Cas System of the Human Pathogen Clostridium difficile

Similar protocols

Protocol publication

[…] CRISPRdb tools (), the CRISPRTarget program (), or BLASTN () were used for spacer homology search in the available sequences (April 2015). Several potential CRISPR arrays predicted by CRISPRdb within the tcdA coding region were excluded from further analysis for the following reasons: (i) their location within the toxin-encoding genes in all analyzed C. difficile strains as a part of the cell wall-binding repeat regions within the TcdA amino acid sequence; (ii) the prediction of the corresponding arrays as “questionable sequences” by the CRISPRdb program for several analyzed C. difficile strains; (iii) the absence of a characteristic RNA-seq profile for CRISPR arrays within the tcdA coding region; (iv) the absence of potential targeting of corresponding spacers for the known sequences; (v) the differences in the length and sequence of associated direct repeats with those of active CRISPR arrays. Thus, in such particular cases, the CRISPRdb predictions within repeated coding regions should be considered with caution and would need experimental confirmation.For general CRISPR spacer homology search, the sequences presented ≤7 single nucleotide polymorphisms (SNPs) (80% match or ≥30/37 nucleotides) were considered positive hits. The raw sequencing read data of published genome sequences from 2,207 C. difficile strains () were used to search for the presence of cas loci homologous to the CD2982-CD2975 and CD2455-CD2451 cas operons from strain 630 and the CDR20291_2998-2994 operon from strain R20291. For each strain, the sequencing reads were mapped on the sequence of corresponding cas locus using Bowtie (). Coverage values of ≥80% were considered positive hits for the presence of corresponding cas loci in a given strain. The multilocus sequence typing (MLST) scheme of Lemee et al. () has also been inferred from raw sequencing read data. [...] The complete genome sequences of phiCD24-1, phiCD111, phiCD146, phiCD211, phiCD481-1, phiCD505, phiCD506, phiMMP01, phiMMP03, and phiCD52 were deposited in EMBL-EBI database under accession no. LN681534, LN681535, LN681536, LN681537, LN681538, LN681539, LN681540, LN681541, LN681542, and PRJEB7856, respectively. RNA-seq coverage visualizations of the CRISPR loci are available for strain 630 through and for strain R20291 through (). […]

Pipeline specifications

Software tools CRISPRTarget, BLASTN, Bowtie, COV2HTML
Databases CRISPRdb
Applications Genome annotation, RNA-seq analysis
Organisms Escherichia coli, Clostridioides difficile, Homo sapiens