Computational protocol: Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies

Similar protocols

Protocol publication

[…] To detect orthologous regions between the peach and grape genomes, the whole genome sequence and annotation data of grape were downloaded from Genoscope []. Whole genome sequence of Prunus persica v1.0, primary assembly of Malus domestica and Fragaria vesca beta version FvH4 pseudochromosomes were downloaded from GDR, Genome Database for Rosaceae [,]. The annotation data that includes the prediction of exons and genes were also downloaded from the databases above. All the sequence and annotation files that have been used in this study are available from GDR The whole genome sequences of peach and grape were masked for repeats using RepeatMasker [], as well as the nmerge, WU-BLAST distribution, and faSoftMask distribution utilities of Mercator []. Mercator identifies orthologous regions with one to one ortholgy relationships, rather than producing any syntenic regions in which one region can have many syntenic regions. Mercator employs BLAT-similar anchor pairs to identify orthologous segments in a modified k-way reciprocal best hit algorithm []. Translated sequences of exons, provided by the annotation data, have been used as anchors in these analyses. Two exons from each genome were determined to be similar if the BLAT [] score of the pair was below 1e -10. BLAT scores were computed in protein space. To select the optimal criteria to assess conservation of synteny between Rosaceous genomes, Mercator parameters were varied from between a minimum of 30 exons and a maximum distance of 300 kbp between exons, to a minimum of two exons and a maximum distance of 3 Mbp between exons. As the parameters become less stringent, we observed a sudden increase of the number of orthologous regions without the accompanying increase of the percent geonome coverage. Parameters selected for further analysis were a minimum of ten exons and a maximum distance of 300 kbp between exons as these parameters gave high percentage coverage within the genomes but reduced small-size syntenic regions that are potentially artefactual. With the exception of the analysis shown in Figure , the Malus genome was split into two arbitrary 'sub-genomes' based on the data of Velasco et al []; sub-genome 1 consisted of chromosomes 1, 2, 3, 4, 5, 8, 9, 13 and 14, whilst sub-genome 2 was composed of chromosomes 6, 7, 10, 11, 12, 15, 16 and 17 to use as an input for the Mercator program. This was done to detect orthologous regions in each of the homeologous Malus chromosomes. The anchored position of RosCOS markers in the peach genome were downloaded from GDR [,]. RosCOS markers were anchored to orthologous regions when their anchored positions in peach belong to the corresponding positions of ORs. [...] We used the Multiple Genome Rearrangements and Ancestors (MGRA) algorithm [] to predict Contiguous Ancestral Regions (CARs) that existed in a common ancestor. The orthology map of Prunus, Fragaria and Vitis genomes, produced by Mercator, was used as an input for the MGRA program. The Vitis genome was included in the analysis as anoutgroup. The hypothetical ancestral genome was manually constructed using CARs generated from MGRA, as written in the Result and discussion section above. […]

Pipeline specifications

Software tools RepeatMasker, WU-BLAST, BLAT, MGRA
Databases GDR
Applications Phylogenetics, WGS analysis
Organisms Prunus persica, Malus domestica