Computational protocol: The rice endophyte Harpophora oryzae genome reveals evolution from a pathogen to a mutualistic endophyte

Similar protocols

Protocol publication

[…] Six loci were used in this phylogenetic analysis (). All sequences were aligned, concatenated and manually adjusted using Geneious Pro v4.8.3 ( GTR + G + I model was selected as the best-fit model for the datasets using jModelTest v2. The phylogenetic analyses were performed using the maximum-likelihood (ML) criterion implemented in RAxML through the RAxML-HPC BlackBox web server at CIPRES ( with the best-fit model. Maximum parsimony (MP) analyses were performed using PAUP v4.0b10. Bayesian inference analyses (BI) were performed with the best-fit model using the Markov chain Monte Carlo method in MrBayes v3.2.1. Information about the studied characters in ASR were retrieved from Luo and Zhang and from Yuan et al. MP- and ML-based ASRs were conducted in Mesquite v2.75 ( To account for phylogenetic mapping uncertainty, we also employed a BI approach to analyze ancestral states using the ‘Multistate' option in BayesTraits v.2.0. More details are provided in the . [...] The genome of H. oryzae R5-6-1 was sequenced with the Roche/454 Pyrosequencing Platform and the Illumina Hiseq2000 sequencing platform. Low-quality data that had a QUAL value of less than 30 and consisted of short reads (length < 50 bp) were filtered from the raw data. The high-quality reads underwent primary assembly using the GS De Novo Assembler (Newbler v2.9; Roche) and ALLPATHS-LG (version:allpathslg-43984) and were then scaffolded using SSPACE v2.0. Finally, gaps were filled using SOAP GapCloser v1.0. The completeness of the H. oryzae genome was assessed using CEGMA v 2.0. The H. oryzae R5-6-1 genome sequence has been deposited to GenBank under accession number JNVV00000000. De novo analysis was employed to examine the repetitive sequences. The repetitive elements were identified and classified using a de novo repetitive sequence search with RepeatModeler v1.07 ( For repeat annotation, the repeat library produced by RepeatModeler was used directly with RepeatMasker v4-0-3 ( The RIP indices were determined with the software RIPCAL by comparisons with the non-repetitive genome. [...] Roots infected with H. oryzae were harvested in liquid nitrogen 2, 6 and 20 days after infection (DAI); the same procedure was applied to M. oryzae, except that the harvesting occurred 20 DAI. Roots from 12 independent rice plants were considered an experimental replicate. Total RNA was extracted using TRIzol reagent (Invitrogen) according to the manufacturer's protocol. The RNA integrity of all the samples was verified on an Agilent 2100 Bioanalyzer. Nine H. oryzae libraries (three developmental stages and three biological replicates) and six M. oryzae libraries (two developmental stages and three biological replicates) were prepared with the Illumina TruSeq RNA Sample Preparation Kit and were sequenced using the Illumina HiSeq 2000 based on 100 bp paired-end read sequencing. The insert sizes of all the libraries were 180 bp for both H. oryzae and M. oryzae. All the clean reads were mapped to the genome sequence using TopHat v 2.0.9, and an expression profile was created using Cufflinks v2.0.2. The abundances were reported as normalized fragments per kb of transcript per million mapped reads. Transcripts with a significant P value (<0.05) and a greater than twofold change (log2) in transcript level were considered differentially expressed. All the P values were corrected for false discoveries resulting from multiple hypothesis testing using the Benjamini-Hochberg procedure. Heatmaps of gene expression profiles were generated using R ( based on significant expression changes (log10 FPKM plus 1). […]

Pipeline specifications

Software tools Geneious, jModelTest, RAxML, PAUP*, MrBayes, Mesquite, BayesTraits, Newbler, ALLPATHS-LG, SSPACE, CEGMA, RepeatModeler, RepeatMasker, RIPCAL
Applications Phylogenetics, De novo sequencing analysis
Organisms Oryza sativa, Magnaporthe oryzae, Bipolaris oryzae