Computational protocol: Comprehensive EST analysis of the symbiotic sea anemone, Anemonia viridis

Similar protocols

Protocol publication

[…] EST sequences were processed using SURF analysis pipeline tools (SURF: SeqUence Repository and Feature detection, developed by the SIGENAE team, Dehais Patrice and Eddie Iannucelli, INRA, Toulouse). Basically, SURF provided an integrated solution, from chromatogram data storage to cloned insert detection, by integrating several dedicated bioinformatic software programs (sequence base calling, vector detection, etc.) in order to produce relevant nucleotide sequences according to base quality and feature detection. The chromatogram files were exported to PHRED for base calling [,]. Cloned insert detection was made according to different detected features (vector, adaptator, poly(A) or poly(T) tails and repeat) and their respectively positions, using third party programs (Crossmatch and RepeatMasker). Only inserts with more than 100 bp, with a Phred score >20, and not belonging to a low complexity area were exported into a fasta format with its corresponding quality file. Additional extremity trimming was made using the "trimseq" command (EMBOSS package). Low complexity regions and repeats were masked using the RepeatMasker program []. For this purpose, two different libraries were used: the RepeatMaskerLib (RepBase Update of 2007.09.24, []) and a custom library of A. viridis. This custom library was made both by using CENSOR [] to retrieve publicly available repeats, and by running a BlastN of all ESTs against themselves to identify the most abundant repeat regions.High quality ESTs (39,939) were then assembled into contigs using the TIGR-TGICL tool []. [...] Putative transposable elements (46 sequences) were first identified based on homology search after BlastX analysis against UniProt KB (E-value of < 1.10-20). An additional local BlastN search was performed using the EST dataset both as query sequence file and target database. Repeated motif sequences, i.e. repeats occurring more than twice from non overlapping ESTs, were selected as a first screen repeats dataset. These were used, together with the Repbase repeats library, to mask our EST sequences before clustering and assembling (TGICL). The same first screen dataset was then BlastN compared with the assembled database, and repeated motif sequences occurring on more than two different UniSeqs (E-value of < 1.10-20) were considered as A. viridis repeat sequences. [...] Gene functions were automatically assigned to 39% of the predicted proteins (5,652 UniSeqs). This assignment was based on the identification of InterPro (IPR) domains [] using InterproScan [] and the following command line: iprscan -cli -i unisequences.fa -o unisequences.ipr.raw -seqtype n -goterms -iprlookup -format raw. For comparative analysis of IPR domains found in the A. viridis dataset, we also ran the program InterproScan on the predicted N. vectensis proteome dataset. To homogenize the granularity level of annotation between organisms for each non-overlapping set of domains found, we only kept the root domain. We used the hierarchical organization of domains proposed in the "Parent-Child" description available on the EBI public ftp server . For example, all CYP proteins which have a P450 domain ([InterPro:IPR002949], [InterPro: IPR002397], [InterPro: IPR008070]) were counted at their root domain (IPR001128). [...] All contig and singleton sequences were compared with several databases, using Blast: the Nematostella vectensis draft genome (Predicted proteins, ), SwissProt (2008.03), TrEMBL (2008.03) and other ESTs from symbiotic cnidarian species (Acropora millepora, Acropora palmata, Aiptasia pallida, Montastrea faveolata) or from non symbiotic species (Metridium senile).To confirm the origin of some selected genes, amplifications were performed on 10 ng of genomic DNA from A. viridis epidermal cells (non-symbiotic cells, animal origin []), cultured Symbiodinium cells extracted from A. viridis tentacles (non symbiotic cells, symbiont origin), and whole tentacle extracts (symbiotic cells, both animal and symbionts). Primers designed in the experiment are presented in Additional file and were used in 40-cycle PCR reactions. Elongation factor 1 alpha and Elongation factor 2 were used as positive controls for nuclear-encoded genes from A. viridis and Symbiodinium spp, respectively, while psbA (photosystem II protein D1) was used as a positive control for chloroplast-encoded Symbiodinium spp genes. Sequence alignment was done using Multalin []. Signal peptide prediction was performed using SignalP []. Phylogenetic analyses were done using both MEGA 4.0 [] and PHYML [] software. […]

Pipeline specifications

Software tools InterPro, InterProScan, MultAlin, SignalP, MEGA, PhyML
Applications Phylogenetics, Protein sequence analysis
Organisms Nematostella vectensis
Diseases Ataxia Telangiectasia