Computational protocol: Endocytosis mediated siderophore uptake as a strategy for Fe acquisition in diatoms

Similar protocols

Protocol publication

[…] The ISIP1 protein sequence (Phatr3_J55031) was used as the seed sequence for BLASTp analysis against a curated library comprising all protein sequences in UniRef, the MMETSP (), and all additional sequences hosted on the JGI portal. This composite database, described by Dorrell et al. (), is more extensive than any individual resource. Sequences retrieved with a threshold e value of 1 × 10−05 were verified to be ISIP1 by a BLASTp best hit against version 3 of the P. tricornutum genome (). The resulting set of sequences was aligned using the first 20× iterations of MAFFT v8 and then the inbuilt alignment builder in Geneious v.4.76 under the default criteria. Where multiple sequences were retrieved for individual species, a species consensus was generated. Only 36 species were found to contain ISIP1, of which 28 were from diatom transcriptomes and 8 were from dinoflagellates, pelagophytes, and haptophytes (table S1). These were realigned using Geneious and trimmed at the N terminus to the first methionine present and at the C terminus to the last residue of the protein (with 70% identity). Trees were inferred using the MrBayes and RAxML programs inbuilt into the CIPRES webserver (table S1). A Bayesian tree was inferred using three substitution models (GTR, Jones, and WAG), a minimum of 600,000 generations, and an initial burn-in discard value of 0.5. Trees were only used if the final convergence statistic between the two chains run was <0.1, and tree calculation was automatically stopped if the final convergence statistic between the two chains run was below 0.01. RAxML was inferred using three substitution models (GTR, JTT, and WAG) with automatic bootstrapping, as previously described by Dorrell et al. (). [...] We curated the Tara Oceans data set to consider only data collected from surface samples, which were obtained consistently from the top 10 m of the ocean (excluding samples collected from the deep chlorophyll maximum) across 68 stations in the global ocean transect, for which both metagenome and metatranscriptome data are available. We pooled the data from four different size fractions (0.8 to 5 μm, 5 to 20 μm, 20 to 180 μm, and 180 to 2000 μm) to get near-exhaustive recovery of total eukaryotic biodiversity in each sample. Note that a detailed description of all Tara Oceans field sampling strategy and protocols is available in the study of Carradec et al. (). Next, using the homologs retrieved by the reciprocal BLAST best-hit search (described in the previous section), we built a profile hidden Markov model (pHMM) with the most conserved homologs. That is, we kept only sequences fully covered by the multiple sequence alignment produced by the phylogenetic analysis, because ISIP1 does not belong to a particular protein family in any pHMM database, such as PFAM. We used the pHMM along with the software package HMMER () to retrieve all sequences from the Tara Oceans metagenome and metatranscriptome that matched our model of ISIP1. Only sequences taxonomically assigned as diatoms and not previously associated with any function (all Tara Oceans sequences were functionally annotated by using PFAM and HMMER) () were considered as true positive. For the true positives retrieved, taxonomic assignment was available and is accessible at We have provided the full catalog of hits and their phylogenetic assignation, which was used to build , in table S1. The expression values and genomic occurrences were computed in RPKM, normalized against the 20 most stable diatom ribosomal proteins in the Tara Oceans samples (a robust data set of 30,000 sequences), computed in C++. For the graphical representation, statistical analyses were conducted in R (v 3.1.2) and were plotted using the R package ggplot2 (v 2.1.0) onto a reference National Center for Biotechnology Information diatom tree (). […]

Pipeline specifications

Software tools BLASTP, MAFFT, Geneious, MrBayes, RAxML, HMMER, Ggplot2
Databases Pfam UniRef MMETSP
Applications Miscellaneous, Phylogenetics
Organisms Phaeodactylum tricornutum
Chemicals Iron