Computational protocol: Identification and characterization of large DNA deletions affecting oil quality traits in soybean seeds through transcriptome sequencing analysis

Similar protocols

Protocol publication

[…] Seed transcriptome sequencing data from nine genotypes previously generated in our laboratory were used in this study (Goettel et al. ). RNA-seq data for the nine soybean genotypes are available under NCBI-GEO series accession no. GSE56297. The transcript accumulation for each gene was normalized and indicated as Fragments Per Kilobase of transcript per Million mapped reads (FPKM) as previously described (Goettel et al. ). A gene with a mean FPKM value of all examined genotypes higher than 0.5 or with FPKM values higher than 0 in all examined genotypes was identified as transcribed. The normalized accumulation values of each gene were used to calculate their Z scores as following:Zscore=(x-μ)/σwhere x = log2 (Sample(FPKM + 1))μ=∑sample 1…samplenlog2(Sample(FPKM+1))/nwhere n is the total number of samples, σ is the standard deviation of μ.A custom Perl script was developed to identify co-regulated genome regions within a genotype that contained four or more adjacent and transcribed genes each with a Z score less than or equal to −2 or more than or equal to +2. The regions identified were categorized as putative large deletions or amplifications for further validation. Z scores of all differentially transcribed genes were displayed as a heat map in their chromosomal gene order.The deleted genome sequences were used as queries in BLASTN searches to identify their homoeologous regions in the soybean genome. FASTA sequence files, GFF annotation files and comparison files generated by BLASTN were used as input files for the Artemis Comparison Tool (ACT) (Carver et al. ) to compare the duplicated soybean sequences and analyze their syntenic relationship.MEGA5 (Tamura et al. ) was used for the evolutionary comparison of all homoeologous genes in the duplicated regions. Coding sequences of homoeologous genes were first aligned by ClustalW. Numbers of synonymous and non-synonymous substitutions per site were then calculated using the Nei–Gojobori model (). All positions containing gaps and missing data were eliminated. […]

Pipeline specifications

Software tools BLASTN, ACT, MEGA, Clustal W
Application RNA-seq analysis
Organisms Glycine max
Chemicals Glycine, Palmitic Acid