Computational protocol: Characterization of UGT716A1 as a Multi-substrate UDP:Flavonoid Glucosyltransferase Gene in Ginkgo biloba

[…] In order to get clean reads for de novo assembly and further analyses, all raw reads from RNA-seq were assembled with Trinity (). The EMBOSS toolbox was used to find the amino acid sequence of contigs (). Those amino acid sequences were further used for blastp by comparison with GenBank Nr (NCBI non-redundant protein sequences), GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes), and KOG (euKaryotic Ortholog Groups)/COG (Clusters of Orthologous Groups) database, with E-value < 1e-5. The GO predictions were performed with the Swiss-Prot and TrEMBL database with blastp and E-value < 1e-5; the blastp results were then input to Gopipte according to the gene2go program to obtain the GO information for the top match predicted proteins. By key word search with “glycosyltransferase” or “glucosyltransferase” 121 GT unigenes were obtained. Among them, 25 were annotated as UDP:flavonoid glucosyltransferase. [...] Multiple sequences alignments of target GbUGTs were performed using CLUSTAL W, and the phylogenetic trees were constructed using MEGA 6.0 (). The neighbor-joining statistical method was used to calculate the phylogenetic tree (), with 1,000 bootstrap replications. Distance calculation was performed with Poisson correction and branch lengths were shown only when values were above 50%. […]

Pipeline specifications

Software tools Trinity, EMBOSS, BLASTP, Clustal W, MEGA
Databases UniProt KEGG
Applications Phylogenetics, RNA-seq analysis
Organisms Ginkgo biloba, Escherichia coli, Homo sapiens, Arabidopsis thaliana, Vitis vinifera, Medicago truncatula
Chemicals Flavonoids, Uridine Diphosphate