Computational protocol: Metagenomic Analysis of Bacterial Communities of Antarctic Surface Snow

Similar protocols

Protocol publication

[…] Reads produced by sequencing of 16S rRNA amplicons were subjected to basic trimming (Schloss et al., ). First, sequences were demultiplexed, trimmed by quality with Phred score ≥ 20 and no admission of ambiguous bases using CLC Genomics 7.0 workbench software (CLC Bio Aarhus, Denmark), and sequences longer than 100 bp were taken for further processing. Homopolymers longer than 8 nt were removed using NGS QC toolkit with HomoPolymerTrimming.pl Perl script (Patel and Jain, ) and chimeric sequences were removed using Ribosomal Database Project (RDP) chimera check pipeline (Edgar et al., ). Phylotyping and statistical analysis was performed using the RDP classifier via taxonomic supervised method with 80% confidence threshold cut off (Cole et al., ), as this approach allows rapid and extensive community comparison (Sul et al., ).Raw reads from shotgun metagenomic sequencing were trimmed by quality with Phred score ≥ 20 and no admission of ambiguous bases. Adapters were trimmed using CLC Genomics workbench software (CLC Bio Aarhus, Denmark); reads longer than 50 bp were subjected to further analysis. Trimmed sequences were applied to MG-RAST database (Meyer et al., ). Reads were taxonomically and functionally annotated by similarity searching against M5NR database and Subsystems database, respectively, with default parameters (maximum e-value cutoff of 10−5, minimum identity cutoff of 60% and minimum alignment length cutoff of 15).To specifically search for viral sequences in metagenomic libraries, sequences were subjected to Metavir online tool (Roux et al., ), where they were blasted against Viral Refseq database (NCBI). Obtained affiliated sequences were filtered from bacterial homologs using supplementary pipeline: firstly, they were blasted against nucleotide (nt) database using blastn standalone application and afterwards viral sequences were extracted using Megan 5.10.1 software (Huson et al., ). [...] To construct a set of CRISPR arrays for each metagenomic dataset we used CRASS algorithm (Skennerton et al., ) with default parameters: repeat lengths 23–47 bp, spacer lengths 26–50 bp, and minimum three spacers in array as default parameters. Spacer and repeat sequences were compared with nucleotide (nt) database using BLAST+ tool installed on Galaxy platform with default parameters for short input sequence (word size 7, gapopen 5, gapextend 2, reward 2, penalty -3, e-value 0.01). Repeat sequences from identified CRISPR arrays were classified using CRISPRmap tool (Lange et al., ). The cas genes search was performed using MG-RAST Subsystems annotation tool (Meyer et al., ).To amplify CRISPR arrays of Flavobacterium psychrophilum from total DNA samples primers Flavo_F (CAAAATTGTATTTTAGCTTATAATTACCAAC) and Flavo_R (TACAATTTTGAAAGCAATTCACAAC) were used. Amplification reactions were carried out with Taq DNA polymerase under the following conditions: initial denaturation for 5 min at 95°C, followed by 28 cycles of 30 s at 95°C, 30 s at 55°C, and 40 s at 72°C, and a final extension at 72°C for additional 2 min. Amplicons were visualized on 1% ethidium bromide stained agarose gels and DNA fragments of 200–1000 bp in length were purified from the gel and sequenced on Illumina MiSeq platform as described above. Raw reads were demultiplexed, trimmed by quality with Phred score ≥ 20 and no admission of ambiguous bases using CLC Genomics 7.0 workbench software (CLC Bio Aarhus, Denmark).Spacers from amplified CRISPR arrays were bioinformatically extracted using DNAStringSet function of IRanges package in R. To decrease the amount of spacers and to avoid overrepresented diversity because of mistakes during sequencing, spacers were clustered using a k-means algorithm (MacQueen, ). The maximum number of substitutions corresponding to biologically similar spacers within one cluster was equal to 5. Coverage and diversity estimates Schao and Sace for total amount of spacers or clusters in each sample were calculated with estimateD function of vegan package in R. Centers of spacer clusters (sequences of mean arithmetic value for each nucleotide position calculated from all spacers within a cluster) were compared against nucleotide collection (nt) and environmental collection (env_nt) databases, as well as against custom-made database containing sequences from Antarctic shotgun metagenomic libraries from the present work, with BLASTn algorithm using default parameters for short input sequences mentioned above and an e-value cut off of 0.01. Sequences with < 5 mismatches were considered as positive hits. Metagenomic sequences containing protospacers were blasted against nt and nr databases with default parameters for BLASTn algorithm and an e-value cut off of 0.001 using BLAST+ tool installed on Galaxy platform. PAM searches were performed with CRISPRTarget online tool (Biswas et al., ). Eight nucleotides upstream and downstream of each protospacer were extracted and used for PAM logo search with Weblogo online tool (http://weblogo.berkeley.edu/logo.cgi). […]

Pipeline specifications