Computational protocol: Evolutionary Effects of Translocations in Bacterial Genomes

Similar protocols

Protocol publication

[…] Beginning with a list of genes in the reference HI2424 genome, a reciprocal BLAST () procedure was used to identify orthologs shared by three Burkholderia genomes (B. cenocepacia HI2424, B. ambifaria AMMD, and B. multivorans ATCC17616). Orthologs across three genomes from the genus Bordetella (Bor. bronchiseptica [RB50], Bor. petrii DSM 12804 [petrii], and Bor. avium 197N [197N]) were also identified to assess whether expression and substitution rate gradients were also present in a related taxon with only a single chromosome. This procedure used Perl scripts, Perl DBI, and Bioperl (), where sequences were retrieved from the MySQL database and compared using the BLAST executables (version 2.2.23) with protein databases created using the formatdb tool.To be designated as likely orthologs, all members of gene families needed to be reciprocal best hits in all target genomes and occur in the same chromosome. In addition, orthologs were required to share similar chromosome positions (within ±15% of the position, normalized for genome length, and synchronized relative to the origin of replication) among all genomes to properly assess effects of chromosome location; this filter also reinforces the inference of orthology. However, the screen for conservation of gene position did not discriminate among positive or negative DNA strands or leading or lagging strands relative to the origin; only the distance from the origin was considered. This method of ortholog identification is similar yet simpler than previous methods (; ) because the bacterial genomes are less divergent and are generally syntenic. Larger divergence would likely cause more true orthologs to be discarded by the requirement for synteny because gene order would likely erode with phylogenetic distance. [...] The amino acid sequences for each ortholog triad were aligned using ClustalW 2.1 () via Bioperl modules. The two ClustalW alignments, along with the nucleotide sequences mapped to the protein alignments, were passed to the PAML module codeml () to calculate the nucleotide substitution rates. The default settings for a pairwise calculation in codeml were used. Any calculated synonymous (dS) or nonsynonymous (dN) rate of nucleotide substitution for a pair of genes exceeding 2.0 was discarded because of the unreliable nature of these values due to saturation. The evolutionary distances of the chosen Burkholderia species caused many inferred values of dS to exceed 2.0; on the other hand, most orthologs exhibited dN > 0. We focus our analysis on the B. cenocepacia HI2424 and B. ambifaria AMMD ortholog pairs, but nearly identical results were found using ortholog pairs from HI2424 and B. multivorans ATCC171616 (supplementary material, Supplementary Material online). Substitution rates for Bordetella orthologs were calculated using the same method. [...] A prior study of expression in B. cenocepacia using RNA-seq evaluated the transcriptional response under two conditions (). The specific strains studied were HI2424 and AU1054, which were grown in both synthetic CF sputum medium and soil medium (SE); the normalized results for SE conditions for both strains were obtained as it likely represents a more natural growth condition. Genes in the HI2424 genome were linked to the AU1054 genes used as a reference in this study using MUMmer 3.0 (), which provided data that were read into R () to link and cross-reference expression values between genomes.Expression data from microarrays were obtained for Bor. bronchiseptica (RB50) as part of a study of growth-phase-dependent gene regulation (). The bacteria were cultured on Bordet–Gengou agar, and the RNA of interest in our study was isolated during mid-logarithmic phase (15 h). The cDNA was fluorescently labeled and hybridization to a RB50-specific long-oligonucleotide microarray with reporters linked to RB50 gene loci. After background subtraction, the mean values for six replicates were calculated. Subsequent filtering for loci with expression values greater than zero facilitated use of the cube-root transformation. […]

Pipeline specifications