Computational protocol: Proteome Adaptation to High Temperatures in the Ectothermic Hydrothermal Vent Pompeii Worm

Similar protocols

Protocol publication

[…] Alvinellid specimens were collected on the surface of deep-sea hydrothermal vent chimneys along the East Pacific Rise (EPR) using the telemanipulated arm of either the manned submersible Nautile or the ROV Victor6000 and brought back to the surface inside an insulated basket during the oceanographic cruises Phare2002 (13°N/EPR) and BioSpeedo2004 (18°25S/EPR). Individuals were dissected in RNALater and subsequently frozen in liquid nitrogen following their recovery on board. Although not subjected to specific property regulations, authors already obtained permission to use samples for genomic analyses from both chief-scientists. The EST sequencing of A. pompejana and P. grasslei cDNA libraries yielded 15,858 transcripts encoding 9,221 proteins for the former (accession numbers: FP489021 to FP539727 and FP539730 to FP565142), and 1653 unigenes (including the 335 orthologous genes used in the comparative analysis: see DatasetS2) for the latter, from which 259 and 80 ORFs encoding ribosomal proteins, respectively. A search for a set of orthologous ribosomal protein genes was performed across our own libraries, and complete sequenced genomes or mollusc/annelid EST libraries in Genbank/EMBL and JGI databases using reciprocal tBLASTx (p-value threshold of e−20). Abbreviations and corresponding species from which the concatenated set of ribosomal protein transcripts has been obtained from either complete genomes (*) or EST libraries (**) are listed as follow: Hsapiens: Homo sapiens (*), Dmelano: Drosophila melanogaster (*), Celegans: Caenorhabtidis elegans (*), Strong: Strongylocentrotus purpuratus (*), Capite: Capitella teleta (*), Helobd: Helobdella robusta (*), Bfloridae: Branchiostoma floridae (*), Lumbri: Lumbricus rubellus (**), Areni: Arenicola marina (**), Crasso: Crassostrea gigas (**), Argope: Argopecten irradians (**).Sequences were translated using Se-AL v2.0 and subsequently aligned with Clustal-W . Amino-acid/codon alignments () were then checked by eye, concatenated and exported in a PHYLIP format from a series of partial ribosomal protein genes (L5, L7, L7A, L9, L10, L10A, L12, L13, L13A, L14, L17, L17A, L18, L18A, L19, L21, L23A, L26, L27, L27A, L30, L31, L32, L35, L36, L37, L37A, L39, P2, S2, S3A, S4, S6, S7, S8, S13, S15, S15A, S16, S17, S18, S21, S23, S24, S25, S27, S27A). Regions containing gaps, misalignments or uncertainties were excluded from the analysis. [...] Observed amino acid and codon frequencies were estimated from 46 concatenated ribosomal protein genes (containing 5991 codons) using the codeML package of the software PaML v3.14 and the ‘universal’ genetic code. Standard deviations on frequencies were obtained from 100 rearrangements (bootstrap) of the dataset. The user tree was obtained using the packages ProML of the PHYLIP v3.65 software using the JTT model of amino-acid substitutions, the hidden Markov Model (HMM) and a Gamma+I distribution. Sequence hydrophobicities were calculated using the hydrophobic index based on the Sweet & Eisenberg's OMH scale (weighted sum of D: −1.31, E: −1.22, N: −0.92, Q: −0.91, G: −0.67, K: −0.67, H: −0.64, R: −0.59, S: −0.55, P: −0.49, A: −0.40, T: −0.28, C: +0.17, W: +0.50, V: +0.91, M: +1.02, L: +1.22, I: +1.25, Y: +1.67, F: +1.92). This index takes into account the ability of an amino acid to be replaced by another during the course of evolution. The regression curve between GARP and FYMINK residues and the calculation of confidence intervals at 99% were performed using the software Statistica 8. The search for branches and codons under positive selection was performed under PaML v3.14 using LRTs between codon models and the Lophotrochozoan group as a phylogenetic background. We tested whether the ribosomal proteins evolved under different selective constraints across lineages by comparing the single dN/dS ( = ω) ratio neutral model (M0) and the free dN/dS ratio branch model (one ratio for each branch) implemented in codeML. We then identified sites in these lineages that have experienced a positive change during the course of the annelid evolution by considering a branch-site model of selection with A. pompejana, P. grasslei, alvinellid internal branch and the polychaete internal branch alternatively specified as the foreground lineage. This model includes four categories of omega (ω<1, ω = 1, ω>1 against the average omega background and ω>1 against a neutral omega background). The significance of this model is evaluated using LRT against a null model, the ‘nearly neutral’ site model M1A with two categories of dN/dS (ω close to zero and ω equal to 1). Bayes empirical Bayes (BEB) are used to compute posterior probabilities for ω classes and to identify sites under selection in case of significant LRT. We finally determined whether inferred positively-selected sites could be associated with a specific amino-acid ‘category’ in the foreground lineage/internal branch. This allowed us to get a better picture of whether thermal/depth adaptation is a recently-derived or an ancestral character. [...] 3D models from the 49 complete ribosomal proteins were obtained by comparative modelling using MODELLER 9v8 software. The structures of Saccharomyces cerevisiae or Tetrahymena thermophila homologues of large (29, pdb entry 1S1I) and small (20, pdb entry 1S1H or 2XZM) ribosomal subunits were used as templates following a series of parameters. The alignments used for homology modelling were improved manually, taking into account the predicted secondary structures of the query subunits and those of the templates. Fifteen models were generated for each subunit and their quality assessed using the Modeller Objective Function parameter; the best model was retained and the side chains were repositioned in optimized conformations, using SCWRL. The different ribosomal subunits models were finally subjected to energy minimization using the Gromos 96 force field (Deep View 4.0/Swiss PDB viewer). The accessible surface area (ASA) for each residue was then determined with the ANAREA program using a 1.4 Å probe radius and Shrake and Rupley's atomic parameters for Van der Waals radii. Residue was defined as buried if the ASA value between 0 Å2 and 20 Å2, intermediate if the ASA value is between 20 Å2 and 60 Å2 and exposed for those with ASA value ≥60 Å2. The twenty residues were classified as hydrophilic, charged or hydrophobic and replacements were sorted as structurally neutral, negative or positive following their buried/exposed status and their ability to introduce/suppress electrostatic or hydrogen bonds. Basically, replacement of a residue between Alvinella and Paralvinella was considered conservative if both belonged to the same category. Conservative replacements were considered as neutral if they did not affect much the volume of the residue and if they were not involved in the formation of additional ionic or hydrogen bonds. Non-conservative replacements were those producing a modification of the hydrophilic vs hydrophobic status of the residue. Both replacements of a hydrophilic residue by a hydrophobic one at a buried position or of a hydrophobic residue by a hydrophilic one at an exposed position were considered as positive because they usually have a stabilizing effect on the protein. Conversely, both replacements of a hydrophilic by a hydrophobic one at an exposed position or of a hydrophobic residue by a hydrophilic one at a buried position were considered as destabilizing . This criterium was however refined by considering separately replacements involving charged residues. Polar and charged residues found in a buried position were considered as negative unless they are able to produce an additional electrostatic/hydrogen bond since charges do not always have a destabilizing/stabilizing effect when placed in a buried/exposed environment . The 3D models were also used to examine the position of alanine replacements in the ribosomal proteins between the two alvinellid species. […]

Pipeline specifications