Computational protocol: Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera) Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

[…] The predicted proteome of the 12-fold coverage grapevine genome sequence assembly (GenBank, NCBI project ID 18785; Genoscope website: was screened with two HMM profiles of the PFAM motifs [] PF01397 (N-terminal TPS domain) and PF03936 (TPS, metal binding domain). In addition, the 12-fold genome sequence assembly was screened (TBLASTN) with known TPS sequences from Swiss-Prot in order to be not dependent of the automatic annotation. The 152 loci exhibiting significant similarities with known TPS (all BLAST hits with an e-value lower than 1.e-4 were individually evaluated) were manually annotated to correct erroneous automatic annotation and to discriminate between complete, partial and pseudo-TPS. Genomic regions with similarities spanning on less than 50 amino acids with TPS have not been considered. The manual annotation is based on the results of the EuGène predictor-combiner software [] that was specifically trained for Vitis vinifera, sequence alignments with previously characterized TPS proteins and related PFAM motifs, spliced alignments [] of cognate EST and cDNA sequences and knowledge of TPS gene structure and protein sequences. Data and other related information were imported and merged in the ARTEMIS tool [] to evaluate each resource and produce the final annotation. The EuGène predictions, the manual structural annotation of the 152 loci and the corresponding sequences are available in the FLAGdb++ database[]. Protein sequences deduced from the 69 full VvTPS genes were analyzed with ChloroP for prediction of N-terminal plastidial targeting peptides [] [...] Amino acid alignments were made using Dialign ( with a threshold value of 10. Manual adjustments such as aligning conserved motifs and manual trimming were performed using GeneDoc For all analyses, sequence information upstream of the partially conserved RR(X)8W motif was trimmed. Maximum likelihood analyses were completed using Phyml [] available at For each analysis, the LG amino acid substitution model and four substitution rate categories were used, the proportion of invariable sites and the gamma distribution parameter were estimated, and the branch lengths and tree topology were optimized from the data set. The estimated values for the proportion of invariable sites and the gamma shape parameter were then used when performing 100 bootstrap replicas. Phylogenetic trees were visualized using TreeView […]

