Computational protocol: Development and validation of a mixed-tissue oligonucleotide DNA microarray for Atlantic bluefin tuna, Thunnus thynnus (Linnaeus, 1758)

Similar protocols

Protocol publication

[…] Low-quality reads (Phred score < 30) were filtered out and PRINSEQ v0.20 [] was used to remove PCR duplicates and low complexity sequences. Two complementary sequence read assembly methods were chosen for this study: WGS-assembler 7.0 [] and Mosaik-aligner v2.1 [] to assemble the contigs based on the Genbank archived T. thynnus EST database []. Subsequently, to lower the redundancy resulting from de-novo assemblies, TRF 4.07b [] was used to remove all transcripts shorter than 500 bases or exhibiting repetition, with entropy above 1. [...] The longest coding DNA sequences were determined for each transcript using the command-line program getorf from the EMBOSS v6.6.0 package []. ESTScan v2 [] was then used to confirm transcript coding regions and determine sequence orientation. The BLAST algorithm [] was used for sequence similarity searches against public databases. The coding sequences of the predicted transcripts were annotated using BLASTP searches against the GenBank Reference Proteins database [], with an expectation value (e-value) cut-off of 10−4, minimum similarity of at least 60 %, and minimum alignment length of 33 amino acids. Additionally, the transcripts were annotated using BLASTN searches against the annotated EST T. thynnus database []. The inferred annotations were used to retrieve Gene Ontology (GO) annotation for molecular function, biological process and cellular component []. To avoid redundant functional assignments, the best-rated similarity hit with at least one GO annotation was chosen. A custom pipeline converted GO terms to GO Slim terms, using the Protein Information resource and Generic GO Slim files []. Kyoto Enyclopedia of Genes and Genomes (KEGG) database [] was used to infer functional annotation of the sequences through KAAS (KEGG Automatic Annotation Server) [] using single-directional best hit method for ESTs. Sequences with assigned KO (KEGG Orthology) identifiers were categorised into functional groups according to the KEGG BRITE hierarchy, excluding human diseases. [...] Transcripts with a significant match to a reference protein were chosen for microarray probe design. In addition, 57 publicly available full mRNA sequences from the T. thynnus nucleotide (NCBI) database were also included. Sixty-mer oligonucleotide probes were designed using the eArray (Agilent Technologies, UK) online probe design tool with Base Composition and Best Probe Methodologies, 3’bias and sense orientation. Two probes were designed for each target transcript. After initial screening, unique probes showing no cross-hybridisation potential were selected to produce an 8 × 15 K Agilent custom oligo-DNA microarray design format (Agilent Design ID = 038391), comprising 15,208 user defined features and 536 Agilent positive and negative controls. Target transcripts (N = 6,439) with best annotation were represented with two probes on the array and the remainder (N = 2,190) with a single probe. In all, 6,287 unique proteins were annotated. For calculation of the multiplicative detrending step implemented within the Agilent Feature extraction software 35 probes were replicated five times. [...] Scanning was conducted using an Axon Genepix 4200A scanner with Genepix Pro 6.1 image acquisition software (Molecular Devices, UK) with 60 % red and 90 % green laser power at 5 μm resolution. Saturation tolerance was set to 0.05 % and automatic photo-multiplier tube (auto-PMT) gain used to achieve similar mean intensities of Cy3 and Cy5 signals. Acquired data were exported and processed using the Agilent Feature Extraction software (v9.5.3.1) to obtain background-subtracted signals, as well as other spot statistics and quality metrics. Scan data were analysed using GeneSpring GX v12 (Agilent Technologies, UK). Baseline transformation was not employed and data were normalised using a Lowess model. Principal component analysis (PCA) was conducted to visually inspect the distribution of gene expression variance among arrays within and between experimental conditions. Following removal of Agilent control features, stringent quality filtering involved removal of saturated and non-uniform features, population outliers and features that were not significantly positive with respect to the local background.One-way ANOVA unequal variance (Welch) (p < 0.05) was applied to preselect features showing potential differences in gene expression between tissues. Unsupervised network analysis was conducted on identified features using Biolayout Express3D [], using a Pearson coefficient of correlation as a similarity measure and threshold over 0.94. The network graph was clustered into distinct patterns of gene expression using a Markov clustering algorithm (MCL) with inflation value set to 2.0 and smallest cluster allowed being 10, to achieve optimal granularity of clusters and focus only on larger areas of high connectivity [].Functional gene set analyses were performed on the entire list of quality filtered entities based on KEGG KO identifiers. During the analyses, gene expression profiles were collapsed into 3,222 unique KOs using the median method. Tests were run to identify significantly up- and down- regulated pathways for each tissue and their pairwise combinations against all other samples using Generally Applicable Gene-set Enrichment (GAGE) analyses [], implemented as R/Bioconductor package [], which is robust to small datasets with different sample sizes. A default FDR q-value of 0.1 was used as a cut-off. Pathways were subsequently grouped into KEGG functional categories and the difference between the counts of up- and down- regulated pathways in each category hierarchically clustered and displayed as a heatmap. [...] Ten transcripts from different Biolayout derived clusters showing tissue-specific and diverse expression profiles, and spanning a wide range of fold change (FC) values, were chosen for RT-qPCR validation. Three additional genes were tested to serve as internal controls (reference genes): elongation factor-1α (elf-1α) [], beta-actin (actb) [] and a transcript showing a stable expression profile across the experimental conditions, annotated as FtsJ methyltransferase domain containing 2 (ftsjd2), from the microarray. Primer3 [] software was used to design primers for the selected sequences with at least one in the pair overlapping the target hybridisation area of the microarray 60-mer probe. Complementary DNA (cDNA) was synthesised from 2 μg of the same total RNA extractions used for the microarray experiment employing the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, UK). The synthesis was primed with 1.5 μl of random hexamer primers supplied with the kit and 2.5 μM anchored-oligo (dT)20 (Eurofins, Germany) per 20 μl reaction. DNase treatment was not performed prior to cDNA synthesis as it may result in RNA degradation []. Instead, minus RT (RT-) controls (reverse transcriptase replaced with water) were created to control for the presence of genomic DNA in the extractions.Real-time PCR assays (20 μl) were performed on a Mastercycler ep realplex2 (Eppendorf, Germany) using Luminaris Color HiGreen qPCR Master Mix (Thermo Scientific, UK), in duplicate with 25 × cDNA template dilutions and 0.3 μM of each primer, according to the following thermal profile: 50 °C for 2 min (Uracil-DNA Glycosylase pre-treatment), 95 °C for 10 min, followed by 40 cycles of denaturation at 95 °C for 15 s, annealing at 60 °C for 30 s and elongation at 72 °C for 30 s. After amplification a melting curve analysis was performed from 55 °C to 95 °C at 0.5 °C increments for 15 s each to verify single product amplification. The presence of the correct size of PCR product was also verified by agarose gel electrophoresis. Six-point standard curves of a 4 × dilution series of a pool of all starting cDNAs were run with every assay to determine primer-specific efficiencies. A standard curve was used to extrapolate non-normalised relative copy numbers for each gene and the BestKeeper application [] was used to inspect the suitability of reference genes for normalisation.To compare the fit between qPCR and microarray fold changes (FCs), we calculated the concordance correlation coefficient (CCC) as well as other regression metrics previously described by Miron et al. []. The statistical significance of individual targets’ expression profiles was examined on the combined qPCR and microarray data for each gene, coded by their respective percentile ranks, using a nonparametric Kruskal-Wallis test implemented in R software. […]

Pipeline specifications

Software tools EMBOSS, ESTScan, BLASTP, BLASTN, KAAS, Agilent Feature Extraction, GenePix Pro, GeneSpring GX, GAGE, Primer3, BestKeeper
Databases KEGG
Applications Gene expression microarray analysis, qPCR, Transcription analysis
Organisms Thunnus orientalis, Oreochromis niloticus
Chemicals Nucleotides