Computational protocol: The Organelle Genomes of Hassawi Rice (Oryza sativa L.) and Its Hybrid in Saudi Arabia: Genome Variation, Rearrangement, and Origins

Similar protocols

Protocol publication

[…] Both Hassawi rice cultivars were collected from Al-Hassa, Kingdom of Saudi Arabia. We extracted genomic DNA from 50 g young green leaves according to a CTAB-based method and constructed libraries according to the GS FLX Titanium general preparation protocol, started with 5 g purified DNA. The ssDNA libraries were amplified with emulsion-PCR and enriched, and the samples were sequenced on Roche/454 GS FLX platform. In addition, two mate pair libraries for both cultivars were constructed by following SOLiD Library Preparation Guide (SOLiD 4.0). 20 µg or more genomic DNA was used for sequencing in SOLiD 4.0 instrument, which depending on two different insert sizes (500–1000 bp and 1000–3000 bp).We extracted cp and mt genome sequence reads from whole genome sequencing data generated from both 454 GS FLX and SOLiD 4.0 platforms and assembled the 454 GS FLX reads based on a protocol we developed recently. For the cp genome assembly, we filtered cp reads from the raw data according to the three known rice cp genome sequences. The clean cp reads were assembled into contigs by using Newbler (v2.6). In the mt genome assembly, we first assembled the raw data with Newbler, and then used Blast tool to filter for the mt contigs that were aligned to the known rice mt genomes. The mt contigs are not usually clean enough as they often contain cp genome sequences. We also used information on unknown mt contigs and read coverage for the removal of cp sequence contaminations. At the end, we validated the organellar genome assemblies with SOLiD sequencing data. [...] We used DOGMA for cp genome annotation (Dual Organellar GenoMe annotator) and manually corrected start and stop codons. We annotated mt genome based on aligning sequences to the known rice mt genomes using NCBI BlastX and BlastN tools. We carried out all BlastN and BlastX searches using the blastall executable (version 2.2.25) with default settings (e-value 1e-10). Protein-coding genes, rRNAs, and tRNAs were identified by using the plastid/bacterial genetic code. We also used tRNAscan-SE to corroborate tRNA boundaries identified by BlastN. [...] SSRs were identified and localized by using the Simple Sequence Repeat Identification Tool (SSRIT) that identifies perfect nucleotide repeats of mono-, di-, tri- tetra-, tetra-, penta-, and hexa-nucleotides, and those equal or greater than three repeat units were collected except monomers. Intersubspecific polymorphisms were first identified based on the MUMmer package (v3.06) . The results were then acquired by using a custom-designed Perl script and confirmed through careful visual inspection. Intravarietal polymorphisms were identified by using Newbler (v2.6) and Bioscope (v1.3) software, for 454 data and SOLiD data, respectively. We carried out repeat sequence analysis using the REPuter web-based interface ( , including forward, palindromic, reverse, and complemented repeats with a minimal length of 50 bp. Cp-derived sequences are identified with BlastN search of mt genomes against annotated cp genomes (Identity ≥80%, E-value ≤1e-5, and Length ≥50 bp). The cp-derived sequences were then aligned to all known plant mt genomes by using BlastN (Identity ≥80%, E-value ≤1e-5, and Coverage ≥50%). The syntenic regions of cp and mt genomes between different cultivars were detected by using Nucmer of the MUMmer package (v3.06) with 50-bp exact minimal match. The annotated cp and mt genome features including gene coordinate, genome structures in cp genomes, repeats in mt genomes and different genome variations were used to draw genome maps using Circos software . [...] The whole cp genomes of five rice cultivars were aligned using the program MAFFT version 6 and adjusted manually where necessary. The unambiguously aligned DNA sequences were used for phylogenetic tree construction. Maximum likelihood method analysis was performed with PhyML v3.05 under GTR (General time Reversible) model of nucleotide substitution to construct phylogenetic tree. 1,000 bootstrap replications were used to estimate the confidence of brand points. We obtained the best tree after heuristic search with the help of Modelgenerator . […]

Pipeline specifications