Computational protocol: Comparative Genomic Analysis of Acanthamoeba Endosymbionts Highlights the Role of Amoebae as a “Melting Pot” Shaping the Rickettsiales Evolution

Similar protocols

Protocol publication

[…] For each gene of the six genomes of Rickettsiales endosymbionts of amoeba (Cca, Cpa, Cps, Eau, Cot, and Cja), a BLASTP search was performed against 2,461 bacterial, 144 archaeal, and 109 viral genomes. To include LGTs that happened prior to and during the divergence of the Rickettsiales lineages, we excluded the BLASTP hits in Rickettsiales and identified the query sequences that have non α-proteobacterial best hits as candidate genes for LGTs (evalue cutoff 1e-7). We performed the same analysis for 13 other representative Rickettsiales species that are not amoeba endosymbionts (Rickettsia prowazekii str. Madrid E, Rickettsia conorii str. Malish 7, Rickettsia massiliae MTU5, Rickettsia bellii RML369-C, Rickettsia felis URRWXCal2, Rickettsia felis LSU-Lb, Rickettsia buchneri ISO7, Wolbachia wMel, Orientia tsutsugamushi str. Boryong, Anaplasma phagocytophilum str. HZ, Ehrlichia canis str. Jake, Neorickettsia risticii str. Illinois, and Candidatus Midichloria mitochondrii). For functional annotation, the candidate genes were classified into COGs by hidden Markov model search using HMMer3 ().To focus on bacterial species that are more likely engaged in LGTs with Rickettsiales, we ranked the non α-proteobacterial species by the number of times that they showed up as the best hit in the BLASTP search and then performed phylogenetic analysis for candidate genes with a best hit in the top 10 ranked species. Genes were clustered into families using the Markov Cluster Algorithm with all-against-all BLASTP e-value cutoff of 1e-15 (). For each gene family, its homologs from all complete bacterial and archaeal genomes were retrieved by BLASTP search (e-value cutoff 1e-15). Protein sequences were aligned using MAFFT () and ambiguously aligned columns were trimmed using ZORRO with a probability cutoff of 0.4 as suggested (). Phylogenetic trees were constructed using RAxML with the best model selected by the program (). Gene families with spurious alignment (e.g. families with Ankyrin or Sel1 repeats) were excluded from the phylogenetic analysis. When possible, each individual tree was rooted using three different rooting methods, rooting with Archaea or Deinococcus as the outgroup or midpoint rooting. Each of the rooted trees was scanned for a bipartition in which the Rickettsiales genes clustered with homologs in the 10 amoeba-associated bacteria using a customized perl script. The species tree of Rickettsiales and the non-α-proteobacterial LGT partners was reconstructed using the 31 bacterial universal marker genes in AMPHORA2 (, online) (). LGT events were inferred by reconciling the gene tree with the species tree to minimize the number of the LGT events (maximum parsimony). The same phylogenetic analysis pipeline was applied to identify LGTs between the 19 Rickettsiales genomes and 109 viral genomes. […]

Pipeline specifications

Software tools BLASTP, HMMER, MAFFT, ZORRO, RAxML, AMPHORA
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Escherichia coli, Bacteria