Computational protocol: Plastid genomes of two brown algae, Ectocarpus siliculosus and Fucus vesiculosus: further insights on the evolution of red-algal derived plastids

Similar protocols

Protocol publication

[…] For E. siliculosus, several scaffolds corresponding to plastid DNA were detected by similarity to other plastid genomes in an assembly of shotgun sequenced total genomic DNA produced by Genoscope http://www.genoscope.cns.fr/spip/-Ectocarpus-siliculosus-.html. These scaffolds were removed from the rest of the sequence data and the sequence of the circular genome was completed by manual assembly and PCR amplification of gap regions. The plastid genome was annotated using the GenDB interface [], available through the bioinformatics' facilities of the Marine Genomics Europe Network of Excellence.For F. vesiculosus, two main strategies were used to obtain the full genome sequence: 1) Plastid-enriched DNA (cpDNA) was digested (HindIII), and cloned into pBluescript II (SK-) (Stratagene). Positive colonies were randomly picked and those with inserts > 1 Kb after digestion were end-sequenced. 2) Plastid DNA was used to make uncloned, adaptor-ligated libraries for a genome-walking approach using long-distance PCR (GenomeWalker kit, Clontech, Palo Alto, USA). Gaps in the genome were filled by PCR, based on predicted gene organization in red-lineage plastids. The F. vesiculosus plastid genome was assembled using CodonCode Aligner (CodonCode Corp., USA). Protein coding genes and putative open reading frames (ORFs) were identified by database comparison (Blastx, []) and online tools (ORF Finder, NCBI). Ribosomal and tRNA genes were identified using RNAmmer http://www.cbs.dtu.dk/services/RNAmmer/[] and ARAGORN http://130.235.46.10/ARAGORN/[], respectively.The two plastid sequences are available under the following EMBL accession numbers: E. siliculosus (FP102296) and F. vesiculosus (FM957154). The physical maps of the circular genome were drawn using GenomeVx (freely available at wolfe.gen.tcd.ie/GenomeVx/). [...] For global gene content comparisons, the two brown algal plastid genomes were analysed together with those of the xanthophyte V. litorea [] and the raphidophyte H. akashiwo [] plus the 15 algal sequences and the two reference cyanobacterium genomes analysed by Khan et al. []. The phylogenetic analyses were conducted with a total of two cyanobacterium and 18 plastid genomes, including four complete genomes from red algae and nine from chromist species (see additional file , Table S4). Three concatenated protein datasets were constructed from these genomes (additional file , Table S3). The first dataset corresponded to the 44 plastid protein-coding genes shared by all 20 species. In addition, a larger dataset of 83 proteins was built using all the plastid proteins common to the 13 red, cryptophyte, haptophyte and heterokont algae. A list of gene synonyms used during this study is provided in additional file (Table S5), together with complementary gene annotation information. Single and concatenated protein sequences were aligned using MUSCLE [] and each alignment was further optimised using GBlocks []. Datasets for individual genes were first analysed using maximum likelihood, in order to eliminate genes derived from horizontal transfer. Only the rpl36 protein phylogeny suggests a non red-algal origin for the haptophyte and cryptophyte genes, which grouped far outside the red algal and heterokont cluster, as previously reported []. This gene was therefore eliminated from the full 83-protein dataset. The average distance was calculated for each protein with Tree-Puzzle []. We excluded 50 "fast-evolving" protein sequences to produce a dataset of 33 "slowly-evolving" proteins, which present an average distance under the threshold of 0.6. This value was chosen in order to conserve at least half of the analysed positions for the 33-protein dataset.Phylogenetic analyses of concatenated protein data were carried out on 8,652, 16,738 and 8,404 amino acids corresponding, respectively, to the 44-, 83- and 33-protein datasets. A Maximum Likelihood (ML) approach was used to reconstruct phylogenetic trees using PHYML [] under both cpREV [] and JTT [] amino acid substitution matrices with 4 gamma-distributed rate categories and estimated invariable sites. The neighbor-joining (NJ) method was performed with JTT amino acid substitution matrix using the Phylip software package []. For both the ML and NJ methods, bootstrap analyses of 1,000 replicates were used to provide confidence estimates for the phylogenetic tree topologies. Finally, Bayesian inference (BI) analyses were performed with PhyloBayes 3.1d [] using 4 gamma-distributed rate categories. PhyloBayes was run using the site-heterogeneous CAT model as described in Lartillot et al. [] and two independent chains with a total length up to 25,000 cycles, discarding the first 25% as burn-in and calculating the posterior consensus tree. Furthermore, a saturation test was performed on the different datasets to calculate the observed and predicted homoplasy rates as described in the PhyloBayes user manual.To statistically test the topologies of the trees, approximately unbiased (AU) and Shimodaira-Hasegawa (SH) analyses were performed on four topologies. These were selected to reflect the relative positions of haptophyte, cryptophyte and heterokont plastids and were generated by rearrangement of ML and NJ trees (if required). Site likelihoods for each topology were calculated using Tree-Puzzle on the two different concatenated datasets and the AU/SH tests were performed using CONSEL 0.1 []. […]

Pipeline specifications