Computational protocol: Genomic reconstruction of transcriptional regulatory networks in lactic acid bacteria

Similar protocols

Protocol publication

[…] Thirty complete genomes of Lactobacillales (Additional file ) were downloaded from MicrobesOnline database []. Primary TF sets for each studied genome were extracted from P2TF ( database. Sigma factors and RNA binding proteins were excluded from the collections. Groups of orthologs were pre-counted as following. Initially, groups of orthologous proteins were constructed for every pair of TFs sets. All pairwise comparisons were done using BLASTP, and bidirectional best hits (BBHs) were identified if the protein sequences identity was more 50% and the aligned region was longer than 2/3 of the length of the shorter protein. If two paralogous genes from one genome were more similar to each other than to a BBH partner from another genome, both paralogs were added to the same orthology cluster. Finally, all orthologous clusters containing common genes were joined together. The clusters were formed using ad-hoc software written using Oracle RDBMS Express Edition (PL/SQL codes are available by request). TF families were assigned by analysis of protein domain structure using the following databases: CDD [], Pfam [], SMART [], and MicrobesOnline Domain and Families [].For regulon reconstruction we used the previously established comparative genomics approach (reviewed in []) implemented in the RegPredict Web server ( []. The approach includes inference of TFBSs, construction of nucleotide positional weight matrices (PWMs) for TFBSs motifs, and reconstruction of regulons in complete genomes on the basis of prediction of putative TFBSs in promoter gene regions. To take into account possible lineage specific changes in TFBSs motifs, we constructed individual PWMs for the Streptococcaceae and Lactobacillaceae taxonomic groups.The three major workflows used for TF regulon reconstruction are (1) projection and expansion of previously known regulons from model Lactobacillales organisms, (2) projection of known regulons from model organisms belonging to another taxa, and (3) ab initio prediction of novel regulons (Figure  ).In workflow 1, the projection and expansion of previously know TF regulons includes two slightly different workflows. In workflow 1a, both a set of regulated genes and TFBSs motif are known, whereas in workflow 1b, only a set of co-regulated genes is known from the collected experimental data. For previously known TFBSs motifs, a PWM was built and used for identification of additional sites in the analyzed genomes using the Run Profile tool in the RegPredict Web server. All novel true positive TFBSs were added to the training set and the updated PWM was constructed and further used for final regulon reconstruction. For regulons with originally unknown TFBS motifs, we collected a set of upstream regions of known TF-regulated genes and their orthologs and used this set for TFBS identification by the Discover Profile tool in the RegPredict. The TFBS motif discovery tool uses the expectation-maximization algorithm for clustering of all potential motifs with a specified symmetry (palindrome, direct or inverted repeat) and finally optimizes the inferred PWM. In ambiguous cases, putative regulatory elements were validated by phylogenetic footprinting [] using multiple alignments for upstream non-coding regions of orthologous genes.In workflow 2, the previously experimentally studied regulons in other model organisms from the Firmicutes phylum (B. subtilis or S. aureus) were projected to the Lactobacillaceae genomes. For TFBS identification, we used training sets of upstream regions of genes that are considered as orthologs to the TF-regulated genes from other model species outside of the Lactobacillales lineage.Workflow 3 was used for ab initio prediction of novel TF regulons. Initially, the presumably co-regulated genes were predicted by the analysis of conserved gene neighborhoods around a putative TF gene. Upstream regions of presumably co-regulated genes extracted from multiple Lactobacillales genomes were used for identification of TFBSs and PWM construction as described above.The obtained PWMs for known or predicted TFBS motifs were used for comparative genomics reconstruction of regulons in two groups of genomes, the Lactobacillaceae and Streptococcaceae, using the RegPredict Web server []. Each studied genome was scanned with the constructed PWMs using Run Profile tool in RegPredict. The threshold for site search was defined as a lowest score observed in the training set. The consistency check approach [,] and/or functional relatedness of candidate target operons were used to eliminate false positive TFBS predictions.A community of annotators consisting of 18 undergraduate students from Hope College (Holland, MI) performed initial reconstruction of multiple TF regulons in the analyzed groups of genomes. This represented a coordinated annotation effort with expert curators in the context of a microbiology course. The resulting draft regulons underwent strict quality control to ensure accuracy of the reconstructions.Functional gene annotations were uploaded from SEED [], UniProt [] and MicrobesOnline []. Multiple alignments of protein and DNA sequences were built by MUSCLE []. Phylogenetic trees were constructed using maximum likelihood algorithm implemented in PHYLIP package (v 3.69) [] and visualized via Dendroscope tool []. Complete description of the reconstructed regulons including TFs, their target genes and operons, and associated TFBS were uploaded to the RegPrecise database ( []. […]

Pipeline specifications

Software tools BLASTP, RegPredict, MUSCLE, PHYLIP, Dendroscope
Databases Pfam MicrobesOnline P2TF
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Homo sapiens
Chemicals Lactic Acid