Computational protocol: Bioinformatics Analysis of Phylogeny and Transcription of TAA/YUC Auxin Biosynthetic Genes

Similar protocols

Protocol publication

[…] For all proteins of interest, a basic local alignment search tool (BLASTp, NCBI, Bethesda, MD, USA) was used with default parameters as well as the hidden Markov model-based sequence alignment tool (HMMER, EMBL-EB, Hinxton, United Kingdom, http//hmmer.org). Best hits were retained and used for phylogenetic analysis []. The proteome of each species was used as reference for the two analyses (), and the YUC, TAR and TAA proteins of Arabidopsis were used as queries (). Analysis results are shown in for YUC proteins and for TAR/TAA proteins. InterProScan (EMBL-EB, Hinxton, UK) [] was applied for domain predictions (TMD, flavin-dependent monooxygenase domains, tryptophan aminotransferase domains). InterProScan (EMBL-EB, Hinxton, UK) is a platform that integrates multiple and diverse databases for the domain prediction allowing for a comparative but integrated approach whilst reducing redundancy and capitalizing on the individual strengths of the different databases. Reciprocal BLASTp (NCBI, Bethesda, MD, USA) and domain analysis with InterProScan (EMBL-EB, Hinxton, UK) [] were used to verify the relevance of all identified orthologues. The protein data used in this analysis were downloaded from three different databases: plaza genomic (http://bioinformatics.psb.ugent.be/plaza), phytozome (https://phytozome.jgi.doe.gov/pz/portal.html) and congenie (http://congenie.org/); nomenclature was used according to these databases. [...] Selected sequences were first aligned with MUSCLE (EMBL-EB, Hinxton, UK), a multiple sequence alignment tool [], using default parameters. For the construction of the phylogenetic tree Fast-Tree was applied using the default parameters []. Fast-Tree infers approximately-maximum-likelihood phylogenetic trees from alignments. Finally, phylogenetic trees were drawn using the Interactive Tree Of Life ITOL (EMBL, Heidelberg, Germany) []. The protein domains shown in the different phylogenetic clades are detected with INTERPROSCAN (EMBL-EB, Hinxton, UK) [] [...] Data used for the RNAseq analysis was obtained from the NCBI (http://www.ncbi.nlm.nih.gov/geo/browse/). Five different tissues (leaves, roots, flowers, flower buds, and seeds/siliques) as well as total seedling were chosen for the analysis of the expression patterns of the genes of interest in A. thaliana (). Reads from RNAseq libraries were mapped onto the candidate gene sequences allowing no mismatches using TOPHAT v 2.0.14 (University of Maryland, MD, USA) [] with standard settings and maximum of multihits set at 1, minimum intron length set at 15 bp, and maximum intron length set as 6000 bp. Reads were added together for each gene using HTseq-count with the overlap resolution mode set as intersection-non empty and with no strand-specific protocol []. Transcription levels in reads per kilobase of transcript per million mapped reads (RPKM) were normalised to AtSAND (At2g28390). This gene was chosen due to its constant gene expression levels across different tissues at developmental stages in A. thaliana []. […]

Pipeline specifications

Software tools BLASTN, BLASTP, HMMER, InterProScan, MUSCLE, iTOL, TopHat, HTSeq
Applications Phylogenetics, RNA-seq analysis, Nucleotide sequence alignment
Organisms Arabidopsis thaliana, Zea mays
Chemicals Tryptophan