[…] ertain orientation.’, To predict genes, four approaches were used: de novo prediction, homology-based method, EST-based method and transcript-to-genome sequences. For de novo prediction, Augustus, GENSCAN and GlimmerHMM were used with parameters trained on A. thaliana. For the homology search, we mapped the protein sequences of four sequenced plants (Cucumis sativus, Carica papaya, F. vesca and A. thaliana) onto the P. mume genome using TBLASTN, with an E-value cutoff of 1e-5, and homologous genomic sequences were aligned against matching proteins using GeneWise for accurate spliced alignments., In the EST-based prediction, 4,699 ESTs of P. mume were aligned against the P. mume genome using BLAT (identity ≥0.95 and coverage ≥0.90) to generate spliced alignments. The de novo set (28,610 to 36,095), four homologue-based results (24,277 to 29,586) and EST-based gene set (2001) were combined by GLEAN to integrate a consensus gene set. Short genes (CDS length <150 bp) and low-quality genes (gaps in more than 10% of the coding region) were filtered. To finalize the gene set, we aligned RNA-Seq data from bud, fruit, leaf, root and stem to the genome using Tophat (Version 1.2.0, implemented with bowtie1 Version 0.12.5), and the alignments were used as input for Cufflinks (Version 0.93) with default parameters. Open reading frames of those transcripts were predicted using structure parameters trained on perfect genes from homology-based prediction. In the end, based on their coordinates on the genome sequences, we manually combined the Glean gene set and open reading frames of transcripts to form the final gene set that contains 31,390 genes., Paralogues and orthologues genes were identified by BLASTP search (E-value cutoff of 1e-5). After removing self-matches, syntenic blocks (≥5 genes per block) were identified, based on MCscan. The aligned results were used to generate dot plots—for self-aligned results, each block represents the paralogues region that arose from genome duplication, and for inter-aligned results, each block represents the orthologous region that was derived from a common ancestor. We calculated 4DTv for each gene pair in the block and drew the distribution of 4DTv values to estimate the speciation between species or WGD events., Three new parameters, defined in Salse et al., were used to identify paralogous and orthologous relationships between P. mume, M. × domestica, F. vesca and V. vinifera by BLASTN. Paralogous gene pairs that were identified during duplication analysis in P. mume and M […]

Software tools BLAT, TopHat, Cufflinks, BLASTP, MCScan