Computational protocol: The Non-Flagellar Type III Secretion System Evolved from the Bacterial Flagellum and Diversified into Host-Cell Adapted Systems

Similar protocols

Protocol publication

[…] The NF-T3SS clusters identified in this study can be queried using different criteria (including taxonomy and NF-T3SS family name), and visualized along with the results of our Hmmer profiles searches at http://secreton.web.pasteur.fr. The profiles of NF-T3SS and flagellum proteins can be queried at http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::T3SSscan-FLAGscan. The list of NF-T3SSs and flagella included in phylogenetic analyses can be found in . [...] We selected one sequenced model organism from each described NF-T3SS family (genomes marked in red on , list in ). We extracted NF-T3SS protein sequences according to their genome sequence annotations and the literature. We performed similarity searches between these sequences with a Blast “all against all” search and applied a clustering algorithm with stringent parameters on the transformed e-value (-log(e-value), MCL inflation parameter I = 1.5) to sequences showing hits with an e-value lower than 10−3. We obtained nine families that were found in all model systems, which corresponded to the nine previously described NF-T3SS core proteins: SctC, SctJ, SctN, SctQ, SctR, SctS, SctT, SctU, SctV. We aligned these nine protein families with Muscle , manually edited the alignments with Seaview , and built sequence profiles with Hmmer . A similar approach was conducted for flagella from phylogenetically distinct model organisms (MCL clustering, I = 1.8) (List in ). Out of 14 protein families widely conserved in flagella, eight were homologous to NF-T3SS core proteins (, protein clustering of protein families obtained from NF-T3SS and flagellar model systems, MCL parameter I = 2.5), and were extracted to build Hmmer sequence profiles. We also selected three widely conserved flagellar families with no NF-T3SS homolog (confirmed by the clustering above and Hmmer profile searches): FliE, FlgB, FlgC (rod proteins), and built sequence profiles to identify other occurrences of these proteins. Additional profiles were also built for FlgDEKL, FliG, MotA and MotB (MCL parameter I = 1.5) that are essential flagellar-specific genes . [...] We extracted from the genomes the genes encoding proteins homologous to T3SS core genes that were detected as part of a NF-T3SS or flagellum system. In a given system, when multiple Hmmer hits were available for a single gene, we kept the one displaying the lower Evalue and the maximal length. Many flagellar systems had multiple hits for the same genes scattered in the genome. We manually curated a subset of these flagella (357 out of 699 detected, the list of strains is in ). We aligned sequences with Muscle (default parameters, ) and selected informative sites with BMGE (BLOSUM30 similarity matrix, gap rate cut-off = 0.20, sliding window size = 3, entropy score cut-off = 0.5 ). We built phylogenetic trees with RAxML (, Le and Gascuel matrix + 4-categories-discretized Gamma distribution for rate variation among sites + empirical frequencies of amino-acids): we selected the best maximum likelihood tree among 200 different starting tree inferences, and computed 1000 bootstrap trees (i.e. trees based on bootstrap alignments, consisting of randomized sites drawn with replacement from the original alignment, and of the same size of the original alignment). In the case of the ATPase SctN, we built an extra dataset that we extended with previously described outgroup sequences (see ) and built a tree as described above. We also ran an extra phylogenetic analysis in a similar way on a subset of these sequences (see , ). We built a tree as indicated above with a secretin dataset that included i) sequences identified in a previously described dataset that were retrieved using their accession numbers; ii) SctC of detected NF-T3SSs; iii) all the secretins we found in Myxo and Chlamy genomes. Sequences displaying branch lengths longer than 1 substitution per site were excluded from phylogenetic analyses, and the phylogenetic reconstruction was run again with the cleansed dataset. This led to the exclusion of several flagellar systems and of five potential NF-T3SSs. Some of these systems are probably undergoing degradation (). [...] We used the Scriptree program to draw annotated trees (; , , ) and Figtree (http://tree.bio.ed.ac.uk/software/figtree) to draw trees (, , ). Graphics on , , and were drawn with R (http://www.r-project.org). All figures were modified with Inkscape (http://www.inkscape.org). […]

Pipeline specifications