Similar protocols

Protocol publication

[…] The completely sequenced genome of Pseudomonas syringae pv. actinidiae NZ13 was used as a reference for variant calling. A near complete version of this genome was used as a reference in our previous publication and subsequently finished by , where it is referred to as ICMP 18884 (; ). Variant calling was performed on all P. syringae pv. actinidiae isolates for which read data was available.Read data was corrected using the SPAdes correction module and Illumina adapter sequences were removed with Trimmomatic allowing two seed mismatches, with a palindrome and simple clip threshold of 30 and 10, respectively (; ). Quality-based trimming was also performed using a sliding window approach to clip the first ten bases of each read as well as leading and trailing bases with quality scores under 20, filtering out all reads with a length under 50 (). PhiX and other common sequence contaminants were filtered using the Univec Database ().Reads were mapped to the complete reference genome Psa NZ13 with Bowtie2 (default settings for paired-end readmapping) and duplicates removed with SAMtools (; ). Freebayes was used to call variants, retaining variants if they had a minimum alternate allele count of ten reads and fraction of 95% of reads supporting the alternate call (–ploidy 1—min-alternate-fraction 0.95—min-alternate-count 10—report-monomorphic; ). The average coverage was calculated with SAMtools and used as a guide to exclude overrepresented SNPs (defined here as threefold higher coverage than the average) which may be caused by mapping to repetitive regions. BCFtools filtering and masking was used to generate final reference alignments including SNPs falling within the quality and coverage thresholds described above and excluding SNPs within 3 bp of an insertion or deletion (indel) event or indels separated by 2 or fewer base pairs. Invariant sites with a minimum coverage of ten reads were also retained in the alignment, areas of low (less than ten reads) or no coverage are represented as gaps relative to the reference.Freebayes variant calling includes indels and multiple nucleotide insertions as well as single nucleotide insertions; however only SNPs were retained for downstream phylogenetic analyses. An implementation of ClonalFrame suitable for use with whole genomes was employed to identify recombinant regions using default settings and a maximum likelihood starting tree generated by RaxML (; ). All substitutions occurring within regions identified by ClonalFrameML as being introduced due to recombination were removed from the alignments. The reference alignments were manually curated to exclude substitutions in positions mapping to mobile elements such as plasmids, integrative and conjugative elements, and transposons. [...] The maximum likelihood phylogenetic tree of 80 Psa strains comprising new Chinese isolates and strains reflecting the diversity of all known clades was built with RAxML (version 7.2.8) using a 1,062,844 bp core genome alignment excluding all positions for which one or more genomes lacked coverage of ten reads or higher (). Removal of 3,122 recombinant positions produced a 1,059,722 bp core genome alignment including 2,953 variant sites. A starting tree was used to compute 100 bootstrap replicates (-m GTRGAMMA -p $RANDOM -b $RANDOM -# 100), in turn used to draw bipartitions on the best scoring tree (-m GTRCAT -p $RANDOM -f b). Phylogenies were visualized with FigTree (v.1.4.2 http://tree.bio.ed.ac.uk/software/figtree/). Membership within each phylogenetic clade corresponds to a minimum average nucleotide identity of 99.70%. The average nucleotide identity was determined using a BLAST-based approach in JspeciesWS (ANIb), using a subset of 32 Psa genome assemblies spanning all clades (). In order to fully resolve the relationships between more closely related recent outbreak strains, a phylogeny was constructed using only the 62 Psa-3 strains. This was determined using a 4,853,155 bp core genome alignment (excluding 258 recombinant SNPs), comprising invariant sites and 1,948 nonrecombinant SNPs and invariant sites. Trees were built with the generalized time-reversible model and gamma distribution of site-specific rate variation (GTR + Γ) and 100 bootstrap replicates. Psa C16 was used to root the tree as this was shown to be the most divergent member of Psa-3 when including strains from multiple clades. Nodes shown have minimum bootstrap support values of 50. [...] Genomes were assembled with SPAdes (v.3.6.2) using the filtered, trimmed and corrected reads and the –careful flag to reduce mismatches and indels (). Subsequent to quality improvement with Pilon, assemblies were annotated using Prokka (; ). The pangenome of Psa-3 was calculated using the ROARY pipeline (v.3.6.1, A.J. ). The assembly quality (contig number, N50, coverage) of two assemblies (NZ60 and J39) was judged too weak for inclusion in final calculations of the pan genome. Thus orthologs present in 99% (59 out of a total of 60) genomes were considered core in order to account for assembly errors; presence in 57–59, 9–57, and 1–8 were considered soft-core, shell and cloud genomes, respectively. BLASTn-based confirmation (E value cutoff 10−5) was used to confirm the identity of predicted virulence or pandemic-clade-restricted genes in genome assemblies. Integrative and conjugative elements (ICEs) were identified using Psa NZ13 ICE as a query for BLASTn (E value cutoff 10−5). In cases where hits spanned multiple contigs, sequences were concatenated according to the position relative to the query. Identical ICEs share the same color in .Fig. 1.— […]

Pipeline specifications

Software tools SPAdes, Trimmomatic, Bowtie2, SAMtools, FreeBayes, bcftools, ClonalFrame, RAxML, FigTree, JSpeciesWS, Pilon, Prokka, Roary, BLASTN
Databases UniVec
Applications Genome annotation, Phylogenetics, Nucleotide sequence alignment
Organisms Pseudomonas syringae, Actinidia deliciosa