Computational protocol: A Journey across Genomes Uncovers the Origin of Ubiquinone in Cyanobacteria

Similar protocols

Protocol publication

[…] Experimentally, the biosynthetic pathways for Q and menaquinone have been deduced using very detailed BLAST searches of proteins known (or suspected) to participate in such pathways. At difference with previous studies, which have systematically used potentially arbitrary cut-off values (; ; ), I have extended the protein searches to all bacterial organisms for which partial or complete genomes are available, initially without applying cut-off values. A cut-off <1e−10 was used as in previous works () when the searched protein showed a high degree of conservation across the wide phylogenetic span of the proteobacterial organisms that are currently available. However, several proteins involved in the biosynthetic pathways of membrane quinones and their metabolites show low levels of conservation, sometimes even in related taxa. This is particularly the case for chorismate lyase or UbiC, generally the first enzyme for Q biosynthesis (; ; ; ), which often has few hits in extended BLAST searches with any given protein query, because its level of amino acid identity is very limited—only the signature conserved domain (CDD; ) can be recognized across unrelated taxa. Multiple UbiC proteins maintaining the signature CDD of chorismate lyase (cl01230) were thus used in iterative BLAST searches extended to partially overlapping sets of taxa, so as to progressively cover the whole phylogenetic span of proteobacteria and uncover also potential instances of Lateral Gene Transfer (LGT). Clear cases of LGT were inferred from the absence of homologous proteins in related taxa and the clustering together with proteins belonging to another class of proteobacteria having similar ecological properties. For instance, the UbiC proteins present in various strains of the pathogen Bartonella, which belongs to the Rhizobiales order of alpha proteobacteria, did not cluster together with UbiC proteins present in other Rhizobiales, namely those from organisms of the genus Methylobacterium, but were closely related to those of pathogenic taxa of gamma Enterobacterales, such as Serratia and Salmonella. A manually curated alignment of diverse UbiC proteins was produced to assist the proteins searches and refine the results of phylogenetic affinity. Comparable iterative searches and sequence analysis were undertaken for UbiA, UbiD, UbiX, UbiH, and MenF of MQ biosynthesis, as well as all CoQ proteins identified so far in the eukaryotic pathway of Q biosynthesis ().Conversely, the fundamental steps of ring methylation and hydroxylation are carried out by members of two large superfamilies of proteins ( and ): S-adenosylmethionine-dependent methyltransferases (SAM or AdoMet-MTase) and flavin mono-oxygenases with the common Rossmann-fold of NAD(P)(+)-binding proteins, frequently defined as Ubi-OHases (; ). Consequently, it is often difficult to discern the homology of a member of such super-families from paralogues having close structural resemblance. To narrow the BLAST searches to the proteins that genuinely have the closest structural homology to a protein query, Neighbour Joining (NJ) distance trees were derived from the BLAST searches and then carefully examined (). Close homologues were then considered when proteins clustered in the same monophyletic branch that showed the signature CDD of the query in the majority of its leaves. Whenever NJ trees were insufficiently resolved to allow such a selection, manually curated alignments and Maximum Likelihood (ML) trees were produced and examined as previously described (; ). See the legend of and and , online, for more details. Taxonomic position and relationships of unclassified proteobacterial organisms was evaluated by the reciprocal BLAST approach of all coded proteins, as previously used to define genome chimaerism (; ; ). The taxonomic affiliation at the family level was evaluated by statistical analysis of the top hits (5–10) of all the proteins of either unclassified or classified organisms against the whole nonredundant (NR) database. A simplified version of this method was developed and showed high correlation with the results of genomic chimaerism obtained previously (; ). This method is based on the computation of the top five hits obtained with smartBLAST [] for about 100 proteins essential for membrane and energy metabolism (the list of these proteins will be made available upon request). The results obtained with the above approaches were then compared with those derived from the analysis of ribosomal proteins or 16 rRNA as frequently undertaken in metagenomic studies (; ; ). Genome completeness was evaluated with the program BUSCO applied to proteins (). […]

Pipeline specifications

Software tools SmartBlast, BUSCO
Applications Phylogenetics, Metagenomic sequencing analysis, Nucleotide sequence alignment
Organisms Escherichia coli, Bacteria
Chemicals Oxygen, Ubiquinone