Computational protocol: Analysis of conglutin seed storage proteins across lupin species using transcriptomic, protein and comparative genomic approaches

Similar protocols

Protocol publication

[…] RNA was isolated using the Trizol method [] as followed: for each of the eight lupin samples approximately 150 mg of maturing seed (20-26 DAA) was ground to a fine powder under a stream of liquid nitrogen. The ground powder was homogenised in 1.5 mL Eppendorf tubes using two-times 500 μL Trizol reagents (Invitrogen, Carlsbad, CA), followed by a 15 min incubation period at room temperature. After centrifugation at 12000 g and 4°C for 10 min the supernatant was mixed with 200 μL chloroform, followed by another centrifugation step at 12000 g and 4°C for 15 min. The upper phase was mixed with 300 μL of high salt precipitation buffer (0.8 M sodium citrate/1.2 M NaCl) and 300 μL of isopropanol and incubated on ice for at least 10 min to selectively precipitate total RNA. Precipitated RNA was centrifuged (10 min 12000 g and 4°C) and washed twice in 75% ethanol. RNA was dissolved in diethylpyrocarbonate treated water. Quality and quantity was assessed by both BioAnalyzer (Agilent, Santa Clara, CA, USA) and Qubit assays (Invitrogen).One μg of total RNA was used to generate TruSeq RNA libraries (Illumina, San Diego, CA, USA) according to the manufacturer’s recommendations. The eight Lupin TruSeq RNA libraries were pooled evenly and sequenced on a single lane of an Illumina HiSeq1000 on a 2× 100 bp Paired End run.Raw RNA-Seq data for libraries were trimmed for low quality (Cutadapt 1.1 [] (overlap 10, times 3, minimum length 25). Reads trimmed to less than 25 bp were discarded and reads with a discarded pair were retained as singleton reads. Transcriptome sequencing and subsequent trimming resulted in quality-controlled RNAseq data ranging from 1.38 to 2.69 Gb per library (Additional file : Table S1). The RNAseq and conglutin EST sequences used in this study were submitted to the Sequence Read Archive and dbEST at NCBI under bioproject accession PRJNA271721. [...] The conglutin gene sequences identified [] were used to search the Tanjil survey genome and transcriptome assembly [], using BLASTN and tBLASTX in CLC Genomics WorkBench 6.0 (CLC Bio, Aarhus, Denmark) to confirm the presence of the conglutins in the genome and transcriptome assembly and to identify if any additional conglutin family members were present.The Illumina trimmed sequencing reads were sorted into paired and unpaired groups and aligned to the 16 NLL conglutin genes, using CLC Genomics Workbench 6.0. Following the alignment to 16 NLL conglutin reference genes [] the consensus generated homologous sequence was extracted for each conglutin gene from each variety. The extracted consensus sequences were aligned within each family using K-mer based tree construction (using CLC Genomics Workbench 6). The reads per kilobase per million reads (RPKM) values were determined for each of the 16 consensus sequences for each lupin variety using the RNAseq datasets and the RNAseq analysis function in CLC Genomics WorkBench 6.0. […]

Pipeline specifications

Software tools cutadapt, BLASTN, TBLASTX, CLC Genomics Workbench
Databases SRA dbEST
Applications Phylogenetics, RNA-seq analysis, Nucleotide sequence alignment
Chemicals Carbon, Nitrogen