Computational protocol: Transcriptomic profiling of hemp bast fibres at different developmental stages

Similar protocols

Protocol publication

[…] The raw sequences obtained were uploaded in CLC Genomics Workbench 9.0.1. Sequences were filtered as follows: sequences > 35 bps, the sequence quality score was left as default value (0.05), the maximum number of ambiguities was set to 0. Adaptor trimming was performed using the Illumina adaptor sequences, then a hard trim of 14 bps at the 5′ end and 2 bps at the 3′ end was additionally carried out, resulting in a final sequence average length of 59 bps. We had previously published a de novo assembly for the variety Santhica 27 and proven its validity by comparing the results generated with our de novo assembly and with the Finola transcriptome. We decided to merge the reads generated in this study with those previously obtained on the hemp hypocotyl to get a better assembly of the transcriptome of the variety under study. We therefore uploaded in CLC Genomics Workbench 9.0.1 the reads obtained previously for the hypocotyls and those obtained in the present study for the fibres from adult plants. The parameters used are: wording size was set to 20, the bubble size to 50 and minimum contig length of 300. The reads were mapped back to the assembly with a mismatch, insertion and deletion cost of 3 (stringent criteria), and a length and similarity fraction of 0.95. The assembly was then annotated using Blast2GO PRO version 3.0 against the Viridiplantae and A. thaliana non-redundant database. However, in Suppl. Dataset only the annotation against the Arabidopsis database is shown, as it was used for the subsequent Gene Ontology term Enrichment analysis (GOE) in Cytoscape (vide infra). For each library, the mapping was performed with a maximum hits per read of 3, a similarity and length fraction of 0.95, a mismatch, insertion and deletion cost of 3. Mapping was also performed using the transcriptome of the variety Finola, as previously described. The expression values were then calculated using the RPKM method.The expression values were subjected to an ANOVA statistical test with three groups (TOP, MID, BOT), each composed of four biological replicates and, subsequently, to a false discovery rate (FDR) correction. Only the genes showing a corrected p-value < 0.05 were retained for downstream analysis. The obtained data were further filtered by removing those genes showing a maximum value of the means < 1 RPKM (this was done with the purpose of removing those contigs showing negligible changes in gene expression) and a maximum FC > 4 in absolute value. A total of 3268 contigs was obtained (Suppl. Dataset File). [...] Primers were designed using Primer3Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi/) and verified with the OligoAnalyzer 3.1 tool from Integrated DNA technologies (http://eu.idtdna.com/calc/analyzer). Primer efficiencies were checked via qPCR using a serial dilution of cDNA (from 10 ng to 0.0032 ng/µl). The primer sequences, amplicon length and Tm, amplification efficiencies and R2 are indicated in Suppl. Dataset File. [...] The annotation of the putative transcription factors (TFs) in the de novo assembly was carried out with PlantTFcat (http://plantgrn.noble.org/PlantTFcat/), which gave a total of 2484 TFs (Suppl. Dataset File). The ICA was performed with the on-line program MetaGeneAlyse v1.7.1 (http://metagenealyse.mpimp-golm.mpg.de/). The Gene Ontology term Enrichment analysis (GOE) was performed as previously described using Cytoscape (v3.4.0) with the ClueGO v2.3.2 plugin (p-value < 0.05, Benjamini-Hochberg enrichment, gene ontology from level 3 to 8, kappa score set at 0.6). RNA-Seq RPKMs were log2 transformed and loaded for clustering and expression profile analysis in a data analytics software developed in-house. The software includes a Web-based user interface providing interactive data visualisation in the form of a parallel coordinates plot synchronised with 2D scatter plots of PCA projections; the user interface is backed by an R server providing the necessary statistical analyses, in particular correlation clustering and PCA projection of multidimensional data. The software allowed us to configure, execute and visually analyse the RNA-Seq RPKMs; notably, with it we were able to identify the clusters of genes shown in Fig. . […]

Pipeline specifications

Software tools CLC Genomics Workbench, Blast2GO, Primer3, OligoAnalyzer, ClueGO
Databases PlantTFcat
Applications RNA-seq analysis, qPCR
Organisms Cannabis sativa, Corchorus capsularis