Similar protocols

Protocol publication

[…] Sequence reads obtained from the MiSeq platform were checked for poorly sequenced regions and Illumina adapters sequences were trimmed using cutadapt (Martin, ), implemented in Trim Galore! 0.4.0 (with default parameters). After trimming sequences, filtered reads were assembled into contigs using the de novo assembly program SPAdes version 3.6.0 (Nurk et al., ), employing built-in error correction and default parameters. To determine average coverage of sequencing, reads were mapped back to the assembled genome using BWA 0.7.12-r1039 (Li and Durbin, ) and depth of sequencing for each contig was plotted using BEDTools (Quinlan and Hall, ). Genome annotations and gene predictions for contigs larger than 200 bp were performed with Prokka 1.11 (Seemann, ), using the provided Staphylococcus database. Quality of the assembled genomes and assembly metrics was determined using Quast (Gurevich et al., ). The entire genome assembly process was automated using the Snakemake workflow engine (K√∂ster and Rahmann, ). The data was submitted to NCBI under BioProject ID PRJNA342349. [...] Phylogenetic analyses were performed on 24 highly conserved housekeeping proteins (Supplementary Table ). Sequences for each of these 24 proteins for all NAS species were retrieved from the genomes using BLAST+2.2.31 (Camacho et al., ). The sequence for Macrococcus caseolyticus, an outgroup species, was downloaded from NCBIs GenBank database. Multiple sequence alignments for these proteins were created using MUSCLE v3.8.31 (Edgar, ). The resulting alignments were used for phylogenetic analysis. Phylogenetic trees, based on 100 bootstrap replicates, were constructed by employing Maximum-Likelihood (ML), Maximum-Parsimony (MP), and Neighbor-Joining (NJ) methods using MEGA 6.0 (Tamura et al., ). Evolutionary distances for ML and NJ methods were computed using a JTT matrix-based model (Jones et al., ). Maximum-Parsimony trees were obtained using the Subtree-Pruning-Regrafting (SPR) algorithm (Nei and Kumar, ). [...] Phylogenetic trees were constructed based on full-length sequences of 16S rRNA, hsp60, rpoB, sodA, and tuf genes. Full-length sequences of these genes for NAS species were obtained using BLAST+2.2.31 (Camacho et al., ). Multiple sequence alignments for each of gene were created using MUSCLE v3.8.31 (Edgar, ). Maximum-Likelihood, MP and NJ trees based on these sequence alignments were created using 100 bootstrap replicates in MEGA 6.0 (Tamura et al., ).Multilocus sequence analysis was performed on nucleotide sequences of the 16S rRNA, hsp60, rpoB, sodA, and tuf genes. Individual gene alignments were manually concatenated to create a combined dataset of these five genes. Poorly aligned regions from this concatenated alignment were removed using Gblocks 0.92 (Castresana, ) with default settings except the allowed gap position parameter which was changed to 0.5 (50%). Maximum-Likelihood, MP and NJ trees based on 100 bootstrap replicates of this dataset were constructed using MEGA 6.0 (Tamura et al., ). For all trees, evolutionary distances for ML, MP, and NJ methods were computed using the General Time Reversible model (Nei and Kumar, ), Subtree-Pruning-Regrafting (Nei and Kumar, ), and Kimura 2-parameter model (Kimura, ), respectively. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated from final alignments. All trees were rooted using Macrococcus caseolyticus, and gene sequences for this species were downloaded from NCBI GenBank database. [...] A genome-based phylogenetic tree of 441 NAS isolates was constructed using the published pipeline PhyloPhlAn (Segata et al., ). Briefly, the PhyloPhlAn approach is based on the use of 400 ubiquitous and phylogenetically informative proteins. Orthologs of these proteins in NAS genomes were detected using USEARCH v5.2.32 (Edgar, ). Multiple sequence alignments of these proteins were generated using MUSCLE v3.8.31 (Edgar, ). A final concatenated dataset containing 4231 aligned amino acid positions was generated. Final tree construction was performed using FastTree version 2.1 (Price et al., ). To determine effects of various evolutionary models on the resulting tree, Maximum-Likelihood, MP, and NJ trees based on 100 bootstrap replicates of the final alignment were also constructed using MEGA 6.0 (Tamura et al., ). For ML and NJ trees, evolutionary distances were computed using a JTT matrix-based model (Jones et al., ), whereas the Subtree-Pruning-Regrafting algorithm (Nei and Kumar, ) was used for MP tree construction. [...] A phylogenetic tree of all NAS isolates, rooted using Macrococcus caseolyticus, was created based on the core genome of the bovine NAS group. The core set of NAS proteins were identified using the UCLUST algorithm (Edgar, ). Protein families with at least 30% sequence identity and 50% sequence length in 441 NAS isolates were considered core. However, protein families present in <80% of the input genomes were excluded from further analysis. Proteins families which contained potential paralogous sequences (duplicated sequence in same genome) were also excluded from further analysis. Each protein family was individually aligned using MAFFT 7 (Katoh and Standley, ). Aligned amino acid positions which contained gaps in more than 50% of genomes, were excluded from further analysis. Remaining amino acid positions were concatenated to create a combined dataset consisting of 128,080 aligned amino acids. A maximum-likelihood tree based on this alignment was constructed using FastTree 2.1 (Price et al., ), using the Whelan and Goldman substitution model (Whelan and Goldman, ) and JTT matrix-based model (Jones et al., ). [...] Genome-based phylogenetic trees (PhyloPhlAn, ML, MP, NJ) and Core-Genome-Tree (CGT) were visually inspected and compared. Single gene and protein trees were compared with each other and with CGT. Topological differences among them were computed using Robinson-Foulds (RF) distance matrix (Robinson and Foulds, ; Makarenkov and Leclerc, ), implemented as webserver; T-REX (Tree and reticulogram REConstruction,; Boc et al., ). To facilitate comparisons of our results, RF distance scores were normalized using the maximum possible distance between two trees, calculated according to 2(N-3) where N represents the total number of taxa on a tree (Kupczok et al., ; Bernard et al., ). We designated these as normalized-Robinson-Foulds (nRF) scores, with values ranging from 0 (0%) to 1 (100%), and nRF = 0 indicating that topologies of two trees under investigation are congruent. Consequently, higher nRF score indicate a low level of congruence (or high level of incongruence) between two tree topologies. […]

Pipeline specifications