Computational protocol: The distribution, diversity, and importance of 16S rRNA gene introns in the order Thermoproteales

[…] Intron sequences were identified by querying (blastn/blastx) the NCBI nr and the DOE-Joint Genome Institute (DOE-JGI) Integrated Microbial Genomes (with Microbiome Samples (IMG/M) databases) with previously identified 16S gene introns (Additional file : Table S1). Approximately 100 16S rRNA genes were identified that contained over 180 intron sequences (Additional file : Table S2) resulting in a total dataset of ~230 intron sequences (in ~115 16S rRNA genes) distributed at 13 different loci across the entire length of the 16S rRNA gene (Fig. , Table ). Homing endonucleases were identified and secondary structures were predicted using CLC Main Workbench (v6.9.1; Qiagen). Longer intron sequences (> ~100 nt) were translated to amino acid sequence (if possible) and searched against the Pfam database (v. 27.0; []) to identify LAGLI-DADG motif(s) that are indicative of homing endonucleases []. Intron sequences were grouped into the following categories: (i) introns encoding a homing endonuclease, (ii) introns without an obvious open reading frame (remnant), (iii) introns forming small (< ~ 50 nt) predictable hairpin structures that maintain the bulge-helix-bulge motif, or (iv) partial, truncated, or uncharacterized intron sequences.Sequence alignments were performed (manually and/or with ClustalW) before phylogenetic analysis and/or Weblogo3 analysis []. Phylogenetic trees of 16S rRNA genes and intron sequences were constructed by employing Neighbor-Joining or Maximum likelihood methods (Mega 5.2.2; []). Over 50 “universal” archaeal primers were manually aligned to 16S rRNA genes to identify those interrupted by, or that spanned intron loci (Additional file : Table S3). […]

Pipeline specifications

Software tools BLASTN, BLASTX, CLC Main Workbench, Clustal W, WebLogo, MEGA
Databases Pfam IMG/M
Applications Phylogenetics, Genome data visualization
Chemicals Sulfur