Computational protocol: Full genome SNP-based phylogenetic analysis reveals the origin and global spread of Brucella melitensis

[…] The Ion Torrent generated reads were assembled against B. melitensis 16 M (GeneBank: NC003317 and GenBank: NC003318) using MIRA 3, as implemented in Torrent Suite V4.0.2. Assemblies of the two isolates were ordered and aligned based on that of B. melitensis 16 M by Mauve v2.3.1 []. Potential mis-assembly points identified from the alignment were verified by PCR and DNA sequencing using capillary electrophoresis. Scaffolds with mis-assembly points were manually broken and joined using Gap5 v1.2.14 []. The putative tRNA and rRNA were identified by tRNAscan v1.23 [] and rRNAmmer v1.2 [], respectively. The protein coding genes were predicted by GeneMarkS v4.10 []. The putative identities of these genes were annotated by running the Basic Local Alignment Search Tool (BLAST) [] against the NCBI non-redundant, Swiss-Prot [] and Kyoto Encyclopaedia of Genes and Genomes (KEGG) [] databases. [...] SNP divergence of all 53 genomes (Table , 2 draft genomes in this study, 5 complete and 44 drafts, publicly available genomes of B. melitensis and 2 genomes of B. abortus as outgroups) was discovered using the web-based programme SNPs Finder []. The SNPs were evaluated in homologous regions of 600 bp that shared sequence similarity of at least 99%. The repetitive/paralogous sequences in the alignment were eliminated by the algorithm in the SNPs Finder software [] to reduce false positive SNP identification. The deduced SNP set was filtered by eliminating the SNPs that were close to each other at a distance of less than 8 bp, as previously reported [], using in house scripts.The phylogenetic relationships of the 53 genomes were constructed using MrBayes v3.2.1 []. Bayesian MCMC analysis was conducted by sampling across the entire general time reversible (GTR) model space. One million generations were run with a sampling frequency of 500 and diagnostics were calculated for every 5000 generations. A burn-in setting of 25% was used to discard the first 500 trees. Convergence was assessed manually with the standard deviation of split frequencies falling below 0.01. There was no obvious trend for the plot of the generation versus the log probability of the data (the log likelihood values) and the potential scale reduction factor (PSRF) was reasonably close to 1.0 for all parameters. […]

Pipeline specifications

Software tools Mauve, tRNAscan-SE, GeneMarkS, BLASTN, MrBayes
Databases UniProt KEGG
Applications Genome annotation, Phylogenetics, Nucleotide sequence alignment
Organisms Brucella melitensis, Homo sapiens, Brucella abortus