Computational protocol: Adaptation of the short intergenic spacers between co-directional genes to the Shine-Dalgarno motif among prokaryote genomes

Similar protocols

Protocol publication

[…] The complete genome sequences of 530 chromosomes from prokaryotes were downloaded from the NCBI ftp site Perl scripting was used to extract the overlaps and the spacers between adjacent genes. However, we excluded the overlapping genes and only considered the adjacent genes separated by 0 or more bps. For these genes, the spacing length is defined as the distance between the end of the upstream gene and the start of the downstream gene. Although we calculated all the spacing lengths between the genes contained in the 530 chromosomes from the prokaryotes analyzed, we focused our study on the co-directional spacing lengths.Unfortunately, in prokaryotes all analyses of intergenic regions are hampered by annotation errors such as incorrect initiation codon prediction, falsely predicted genes and frameshifts [-]. To check whether incorrect initiation codon predictions affect our conclusions, we analyzed the distribution of distances between co-directional genes calculated with the NCBI annotations and those calculated with the annotations refined by triTISA [], a post-processor program for refining the annotations of translation initiation site. We compared the two distributions (Additional File ) using paired samples T-test and found significant differences (P < 0.0005). However, both groups followed the same tendency and there was a strong linear correlation (R2 = 0.990) (Additional File ). For both distributions, the larger the spacing lengths the fewer the number of pairs, though there are more frequent and less frequent distances within the short spacers between co-directional genes (Figure and Additional File ). We therefore concluded that, although incorrect gene annotations exist, they do not influence our results or conclusions. It is worth commenting that, in general, we observed slightly longer intergenic distances among the refined set, which it is pointing out the tendency of RefSeq to predict as gene the longest open reading frame []. [...] As we studied the co-directional spacing lengths we only considered pairs of genes with an orientation (->->) or (<-<-). When we took into account the DNA direction from 5' to 3', in the case of orientation (->->) we looked at the stop codon of the upstream gene. When we took into account the DNA direction from 3' to 5', in the case of orientation (<-<-) we looked at the stop codon of the downstream gene. The region that involves the upstream gene stop codon, the possible downstream SD motif and the downstream gene start codon (from -20 to 2) was represented by WebLogo [] in E. coli K12 spacing lengths from 1 to 12 bps (Figure ), in B. subtilis spacing length 10 bps, and in T. kodakarensis spacing length 6 bps (Additional File ). The numbers of genes with a predicted SD sequence considered for building each WebLogo in E. coli K12 are shown in Table , while for B. subtilis and T. kodakarensis they are shown in the legend of Additional File . […]

Pipeline specifications