Similar protocols

To access compelling stats and trends, optimize your time and resources and pinpoint new correlations, you will need to subscribe to our premium service.

Subscribe

Pipeline publication

[…] n-coding gene repertoires from 14 plant genomes including G. elata, A. comosus , A. trichopoda , O. sativa (phytozome v10), Z. mays (phytozomev10), D. officinale, P. equestris, Elaeis oleifera, A. thaliana (phytozomev10), V. vinifera (phytozomev10), Populus trichocarpa (JGI), Glycine max (phytozomev10), Picea abies, Physcomitrella patens (ASM242v1) were used to construct a global gene family classification. To remove redundancy caused by alternative splicing variations, we retained only gene models at each gene locus that encoded the longest protein sequence. To exclude putative fragmented genes, genes encoding protein sequences shorter than 50 amino acids were filtered out. All-against-all BLASTp was employed to identity the similarities between filtered protein sequences in these species with an E-value cut-off of 1e−7. The OrthoMCL method was used to cluster genes from these different species into gene families with the parameter of “-inflation 1.5”., Protein sequences from 74 single-copy gene families were used for phylogenetic tree reconstruction. MUSCLE was used to generate multiple sequence alignment for protein sequences in each single-copy family with default parameters. Then, the alignments of each family were concatenated to a super alignment matrix. The super alignment matrix was used for phylogenetic tree reconstruction through maximum likelihood (ML) methods. Before ML reconstruction, we used ProtTest to select the best substitution models. The JTT + I + G + F model was selected as the best-fit model, and RAxMLwas used to reconstruct the phylogenetic tree., Divergence time between 14 species was estimated using McMctree in PAML with the options ‘correlated molecular clock’ and ‘JC69’ model. A Markov Chain Monte Carlo analysis was run for 20,000 generations, using a burn-in of 1000 iterations. Five calibration points were applied in the present study (Fig. ): P. equestris and D. officinale divergence time (47~52.9 million years ago) , O. sativa and Z. mays divergence time (24–84 million years ago),, A. thaliana and P. trichocarpa divergence time (65–89 million years ago) ,, P. trichocarpa and G. max divergence time (56–89 million years ago),, and, root of land plants (407–557 million years ago) ., Expansion and contractions of orthologous gene families were determined using CAFÉ 2.2 (Computational Analysis of gene Family Evolution). The program uses a birth and death process to model gene gain and loss over a phylogeny. Large changes in gene family size in a phylogeny were tested by calculating p-values on each branch using the Viterbi method with a randomly generated likelihood distribution. This method calculates exact p-values for transitions between the parent and child family sizes for all branches of the phylogenetic tree. Enrichment of Gene Ontology terms for G. elata expanded gene families were summarized and visualized using REVIGO (small list, similarity (0.5), SimRel similarity measure)., The expanded and contracted families focused on in this study were confirmed using Fisher’s exact test. For each gene family, we compared the gene count of the tested family in G. elata (copy number of the tested family as numerator, total number of genes of the whole genome as denominator) versus the frequency in D. officinale,, P. equestris, A. comosus, and Arabidopsis thaliana (phytozomev10). In addition, phylogenetic trees were constructed for each family to confirm gene gain or loss events. The extreme case of gene lost was that one gene was absent in the G. elata genome. To avoid false positive gene absence events cause […]

Pipeline specifications

Software tools BLASTP, MUSCLE, ProtTest, PAML, CAFE, REViGO
Databases Phytozome