Computational protocol: Gene body CG methylation and divergent expression of duplicate genes in rice

Similar protocols

Protocol publication

[…] Raw RNA-seq data for the OsMet1-2 mutant and WT were produced previously (Hu et al.) and retrieved from published data (SRP043448 at the Sequence Read Archive (SRA) database). Low quality reads (Phred < 30) were removed from the raw data using the FASTX-Toolkit. All reference sequences (FASTA) and annotation files (GFF3) were from the latest MSU7.0 rice genome (ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/). Cleaned data of each genotype were mapped to the reference rice genome using Tophat2, with one mismatch allowed. Differential expression analysis was performed using Cuffdiff, and differentially expressed genes (DEGs) were defined using a q value < 0.05. We defined those duplicate gene pairs as expression-affected duplicates if one copy of a given duplicate pair was significant changed in expression between WT and OsMet1-2 mutant. We also defined differentially expressed duplicates in each genotype using the exact condition test (q value < 0.05) reported previously in soybean. [...] This was done based on criteria defined previously in rice. In brief, non-TE-related genes were extracted from the rice reference genome (MSU7). Then, the all-vs-all Blastp was used to identify candidate duplicates and a gene pair that was top 5 matched and with an E-value < 10−10 was considered as a candidate duplicates. Then, MCscanX was performed to categorize different types of duplicates, included WGD, tandem, proximal and transposed duplicates, with default parameters. Finally, we only selected those duplicates that have methylation information in both the OsMet1-2 mutant and WT for further analysis. [...] Synonymous (d S) and non-synonymous (d N) mutations were calculated as follows: all coding region sequences and protein sequences of duplicates were pairwise aligned using the default options in MUSCLE, and the alignment results were used to calculate d S and d N values using the ‘seqinr’ package in R. As per the previous study in rice, when d S > 3, duplicates were excluded. […]

Pipeline specifications

Software tools BLASTP, MCScanX, MUSCLE, seqinr
Applications Genome annotation, Nucleotide sequence alignment, Genome data visualization
Organisms Oryza sativa