Computational protocol: Lineage-Specific Evolutionary Histories and Regulation of Major Starch Metabolism Genes during Banana Ripening

Similar protocols

Protocol publication

[…] Members of starch metabolism gene families were identified using predicted proteomes of twelve plant species (Supplementary Table ): M. acuminata (Musaceae, order Zingiberales); rice, Brachypodium, sorghum and maize (Poaceae, order Poales); date palm (Arecaceae, order Arecales), Arabidopsis, grapevine, tomato, potato, peach and woodland strawberry. Protein sequences were identified by combining a BLASTP clustering strategy using a list of reference proteins (Supplementary Table ) and information from Pathway tools databases (), MusaCyc (), the Greenphyl database () and InterProScan () to confirm and complement clustering results.Protein sequences were aligned with MAFFT version 6.717b (). Maximum-likelihood phylogenetic analysis was performed with PhyML version 3.0 () using the LG evolution model and gamma distributed substitution rates. The WAG model () was used for the BAM family. Tree topology was built based on best of nearest neighbor interchange and subtree pruning and re-graphing methods. An approximate likelihood-ratio test with a Shimodaira–Hasegawa–like procedure () was used to estimate branch supports. Sequences from Chlamydomonas reinhardtii (Chlorophytae, green algae), Physcomitrella patens (Bryophytae, moss) and Selaginella mollendorffii (Plantae, Lycophyta) were retrieved from the Greenphyl database v.4.02 (). They were used to identify ancestral groups that originated before angiosperm radiation and to root trees. The global phylogenetic tree of AGPases and phylogenetic trees of SBE and ISA gene families were rooted using sequences from the cyanobacteria Anabaena cylindrica. Short sequences disrupting phylogenetic analyses were not used. Trees were visualized with FigTree v.1.3.1. AGPase subunit types were identified using a global phylogenetic tree of AGPase proteins and annotation of A. thaliana, rice and maize AGPases. For each subunit type, a separate phylogenetic tree was constructed. [...] Duplicated gene pairs resulting from banana WGD were identified based on Musa ancestral blocks available at http://banana-genome.cirad.fr/dotplot () and in the Plant Genome Duplication Database (PGDD, ). WGD gene pairs were identified as deriving from α or β Musa WGD using available α and β blocks (). Additional paralogous relationships were detected using SynMap with default parameters and a 3 to 3 quota-align ratio for banana (). Gene scale duplications corresponding to two consecutive duplicated genes (tandem duplications) and duplicated genes separated by twenty or fewer gene loci (proximal duplications) were detected based on the order of gene identifiers along the chromosomes using an in house-script. Syntenic blocks from the PGDD and published data were used to identify WGD-derived genes for plant species other than banana (; ; ; ). SynFind was used for synteny search at specific regions between species. [...] Banana RNA-Seq data correspond to fourteen libraries that were produced using the method detailed in the supplementary material of . They are available at NCBI SRA under accession number ERP0116302. Illumina reads were mapped onto banana gene models. Expression levels for banana starch metabolism genes were normalized in RPKM (reads per kilobase of exon per million mapped reads on banana gene models) and visualized using the MeV application (version 4.9) from the TM4 software suite (). […]

Pipeline specifications

Software tools BLASTP, Pathway Tools, InterProScan, MAFFT, PhyML, FigTree, SynMap, QUOTA-ALIGN, SynFind, TM4
Databases SRA PGDD
Applications Genome annotation, Phylogenetics, RNA-seq analysis
Organisms Musa acuminata
Diseases Metabolism, Inborn Errors
Chemicals Adenosine Diphosphate, Carbohydrates, Glucose