Computational protocol: Non-Random Inversion Landscapes in Prokaryotic Genomes Are Shaped by Heterogeneous Selection Pressures

Similar protocols

Protocol publication

[…] To establish whether there are biases for symmetric versus asymmetric inversions in different prokaryotic clades, we first aligned all possible intra-phylum genome pairs using the “mummer” program from the MUMmer package () to detect maximal unique matches (MUMs) between genome pairs. The length of MUMs varies from a pre-set threshold to the length of the longest exact and unique sequence match detected in both genomes. A threshold of 20 bp for the minimal MUM length was chosen based on the recommendations of the authors of the MUMmer package as sufficiently large to avoid spurious matches, and sufficiently small to detect a number of matches in the phylogenetically distant genome pairs. We obtained almost identical results for a trial dataset (Proteobacteria with experimentally determined origins) when applying a significantly larger minimum threshold (100 bp) suggesting that spurious matches for shorter MUMs are rare and/or do not unduly affect results (, online). Only genome alignments with at least 40 MUMs in one direction and 20 MUMs in the other direction were retained for analysis.We then considered the residual deviation of individual MUMs from the forward or reverse arm of an imaginary X spanning the alignment plot, as described in the main text, scanning the alignment space at a resolution of 10 kb, and assessing X-type symmetry (Xi,j) for each point i,j according to . To place greater confidence in longer homologous blocks, all residuals were weighed by the ratio of MUM length to minimal MUM length (20 bp) and each Xi,j value was normalized by the sum total of all the weights of MUMs detected, so as to allow comparisons between genome pairs. Density plots were generated using the logspline function in R with default smoothing parameters. [...] We obtained 16S rRNA alignments from the SILVA database v. 119.1 () and mapped GenBank identifiers to DoriC Refseq genomes using the NCBI Genome Browser. In case where the same taxon was associated with multiple 16S rRNA sequences, a single sequence (belonging to the largest genome element) was chosen at random. Two bacteria from the dataset with experimentally determined origins of replication were not found in the SILVA alignments (Rickettsia prowazekii and Vibrio harveyi). For these bacteria, the first 16S rRNA sequence in the GenBank file of the largest genome element was added to the alignment using the SINA aligner available on the SILVA website (https://www.arb-silva.de/; last accessed April 19, 2017). Based on the full 16S rRNA alignment, we calculate pairwise phylogenetic distances between taxa using the dnadist program from the Phylip package v. 3.695 with default settings. We used NCBI Taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy; last accessed April 19, 2017) to divide species into phylogenetic clades of interest (phyla and classes). To render X-type symmetry values comparable between clades, we randomly subsampled from genome pairs within each clade of interest to match a common template of phylogenetic distances. As the distance template, we used the distribution of phylogenetic distances within the Thaumarchaeota–Aigarchaeota–Crenarchaeota–Korarchaeota (TACK) superphylum, a clade with a relatively small number of genome pairs. […]

Pipeline specifications

Software tools MUMmer, PHYLIP
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Bacteria