Computational protocol: Evolution of Trypanosoma cruzi: clarifyinghybridisations, mitochondrial introgressions and phylogenetic relationships betweenmajor lineages

Similar protocols

Protocol publication

[…] Analysed sequences - In a previous paper about multilocus sequence typing (MLST) for T. cruzi, we analysed 13 housekeeping gene fragments by simple neighbour-joining (NJ) analysis with the goal of obtaining a standardised MLST method for DTU assignment (). These sequences were reanalysed in the current work. The GenBank accessions are as follows: JN129501-JN129502, JN129511-JN129518, JN129523-JN129524, JN129534-JN129535, JN129544-JN129551, JN129556-JN129557, JN129567-JN129568, JN129577-JN129584, JN129589-JN129590, JN129600-JN129601, JN129610-JN129617, JN129622-JN129623, JN129633-JN129634, JN129643- JN129650, JN129655-JN129656, JN129666-JN129667, JN129676-JN129683, JN129688-JN129689, JN129699-JN129700, JN129709-JN129716, JN129721-JN129722, JN129732-JN129733, JN129742-JN129749, JN129754-JN129755, JN129765-JN129766, JN129775-JN129782, JN129787-JN129788, JN129798-JN129799, JN129808-JN129815, JN129820-JN129821 and KF889442-KF889646. Additionally, we used T. cruzi marinkellei as an outgroup. Sequence data of the selected targets for T. cruzi marinkellei were obtained from TriTrypDB (available from: tritrypdb.org) under the following accessions: TcMARK_CONTIG_2686, TcMARK_CONTIG_670, TcMARK_CONTIG_1404, Tc_MARK_2068, Tc_MARK_3409, Tc_MARK_5695, Tc_MARK_9874, Tc_MARK_515, Tc_MARK_4984, Tc_MARK_5926, Tc_MARK_8923, TcMARK_CONTIG_1818 and Tc_MARK_2666. In addition, sequences analysed by corresponding to loci 1F8 calcium-binding protein, histone H1, histone H3 and heat-shock protein 60 (HSP60) were downloaded from GenBank. The accessions for these sequences are the following: 1F8 (AF545071, AF545072, AF545074, AY540692, AY540693, AY540698, AY540699, AY540700, AY540703, AY540704, AY540705 and AY540706), H1 (AF545075, AF545076, AF545077, AF545078, AY540672, AY540673, AY540675, AY540676, AY540677, AY540678, AY540679 and AY540680), H3 (AF545087, AF545088, AF545089, AF545090, AY540681, AY540682, AY540683, AY540684, AY540686, AY540687, AY540688, AY540689 and AY540690) and HSP60 (AY540716, AY540717, AY540718, AY540719, AY540720, AY540721, AY540722, AY540723, AY540724, AY540725, AY540726, AF545091, AF545092 and AF545093). Additionally, we analysed 97 cytochrome b (CytB) sequences published in and . The accessions are as follows: AJ130927, AJ130928, AJ130929, AJ130930, AJ130931, AJ130932, AJ130933, AJ130934, AJ130935, AJ130936, AJ130937, AJ130938, AJ439719, AJ439720, AJ439721, AJ439722, AJ439723, AJ439724, AJ439725, AJ439726, AJ439727, EU856367, EU856368, EU856369, EU856370, EU856371, EU856372, EU856373, EU856374, EU856374, EU856375, EU856376, EU856377, EU856378, EU856379, EU856380, FJ002253, FJ002254, FJ002255, FJ002256, FJ002257, FJ002258, FJ002259, FJ002260, FJ002261, FJ002262, FJ002263, FJ156759, FJ168768, FJ183398, FJ183399, FJ183400, FJ183401, FJ549386, FJ549387, FJ549388, FJ549389, FJ549390, FJ549391, FJ549392, FJ549393, FJ549394, FJ549395, FJ549396, FJ549397, FJ549398, FJ549399, FJ549400, FJ549401, FJ555631, FJ555631, FJ555632, FJ555633, FJ555633, FJ555634, FJ555635, FJ555636, FJ555637, FJ555638, FJ555639, FJ555640, FJ555641, FJ555642, FJ555643, FJ555644, FJ555645, FJ555646, FJ555647, FJ555648, FJ555649, FJ555650, FJ555651, FJ900246, FJ900247, FJ900248, JN543701 and JN543702. Finally, the cytochrome c oxidase subunit II-NADH dehydrogenase 1 (COII-Nd1) sequences analysed by were as follows: HQ604870, AF359053, HQ604875, AF359032, HQ604873, AF359030, HQ604877, AF359046, AF359041, HQ604909, HQ604911 and HQ604907. For analyses requiring an outgroup, sequences from T. cruzi marinkellei strain TcMB7 were downloaded from Tritryp (available from: tritrypdb.org) database using a BLAST search strategy. Data analysis - Alignments were produced with MEGA 6.0 software () using default parameters. Regions with gaps in the alignment were excluded from the analyses. Concatenation of CytB and COII-Nd1 fragments was made using MLSTest 1.0 (). A five-nucleotide gap present in the sequences of three strains in the COII-Nd1 alignment was coded as "G" for present and "A" for absent to be considered in the phylogenetic analysis. Sequences obtained in our previous paper () were concatenated before performing most of the phylogenetic analyses. To evaluate congruence among different loci and suitability for concatenation, we performed a BioNJ-ILD test () with 1,000 random permutations. NJ analyses were made with MLSTest software using uncorrected p-distances and considering heterozygous sites as average states. One thousand bootstrap replications were used to evaluate branch support. Maximum likelihood (ML) analyses were conducted with MEGA 6.0 software. The best model for each analysis was selected using corrected Akaike information criterion implemented in jMODELTEST software (). Bayesian analyses were run in MrBayes v.3.1 (). Metropolis-coupled Markov chains (MCMCs) with Monte Carlo simulation were run until likelihoods remained stationary and the two independent runs converged after one million generations. By sampling every 100th generations from the two independent runs in MrBayes and discarding the first 25% of the trees as burn-in, 50% majority-rule consensus phylograms were constructed. Molecular clock and species tree inference were implemented in BEAST package v.2.1 (). First, strict, relaxed lognormal and exponential clock models were analysed for each locus considering a model of coalescent constant population. The Bayesian inference was made with MCMC chains of 4 x 107 states (or 1 x 108 states if convergence was not reached) and sampling trees every 5,000 states. Relaxed exponential and strict clocks were compared using Bayes factor (BF), which was calculated using Tracer software with 1,000 random bootstrap replications to estimate marginal likelihood. Second, a Bayesian co-estimation of the species tree and molecular clock parameters was made for the loci analysed by using a STAR-BEAST analysis. Third, a calibration point was considered in the analysis for those loci whose homologous sequences were present in Trypanosoma brucei strain TREU427 genome and that were informative about DTU relationships. To calibrate the clock-rate estimations, a normally distributed prior of the divergence time between T. brucei and T. cruzi sequences with a mean of 100 million years ago and standard deviation of 2.0 was imposed as previously suggested (). Clock models were unlinked and the implemented model for each locus was selected according to the BF analysis for each gene fragment. The population function in multispecies coalescent parameters was set to linear with a constant root. An MCMC chain of 250 million iterations was run, with parameters and trees sampled every 5,000 iterations and removal of the first 10% of states as burn-in. Log-files were checked for sufficient effective sampling sizes using TRACER v.1.5 ().Because the inclusion of genotypic data of hybrid DTUs (TcV and TcVI) can lead to bias in the phylogenetic analyses, we first obtained patterns for non-hybrid lineages (TcI to TcIV) based on the MLST allelic profiles of sequences analysed by . Next, six hypothetical TcII/TcIII hybrid strains with heterozygous profiles were included in the analysis. A distance matrix was generated based on the number of different alleles between strains. In addition, the distance between heterozygous and homozygous genotypes at each locus was considered 1 if no alleles were shared and 0.5 if one allele was shared. When two heterozygous genotypes were identical, the distance was considered 0. NJ analyses using the PHYLIP package () were performed based on the distance matrices.The NJ method was also implemented to evaluate phylogeny of online available CytB sequences. In addition, the same method was used to analyse sequences published by and an outgroup sequence. Branch support was evaluated using 1,000 bootstrap replications.The allele sequences for TcV and TcVI strains published by were inferred for each one of the 13 loci with the PHASE algorithm implemented in DNAsp (). We analysed 10,000 iterations sampling every each 100 states and discarding the first 1,000 as burn-in. […]

Pipeline specifications

Software tools MEGA, MLSTest, jModelTest, MrBayes, BEAST, PHYLIP, DnaSP
Databases TriTrypDB
Applications Phylogenetics, WGS analysis
Organisms Trypanosoma cruzi, Turnip crinkle virus