Computational protocol: Brain transcriptome variation among behaviorally distinct strains of zebrafish (Danio rerio)

Similar protocols

Protocol publication

[…] CEL files containing raw data were then processed and analyzed using R software and Bioconductor packages [,]. These CEL files have been deposited with the NCBI Gene Expression Omnibus (GEO, Accession: GSE38729, Microarray hybridization data were examined for physical anomalies on the chip by pseudochip and residual error visualizations. Quality assurance of microarray data was completed using the affyQAReport function from the Bioconductor package affyQCReport. Hybridization and housekeeping controls, RNA degradation, sample clustering, NUSE plots, LPE plots, and RLE plots all showed high quality data (not shown) and no chips were removed. The arrays were then pre-processed using the Robust Multi-array Average (RMA) procedure [-] using the affy package []. Next, unexpressed and low variability genes were removed by unbiased filtering. Affymetrix present-marginal-absent (PMA) calls were determined using a P-value cut off for absent of greater than 0.04 and present less than 0.04; marginal calls were treated as absent. Unexpressed genes were then defined as having a signal less than the expression value at which 99 % of genes were called as absent across all samples. A filter on interquartile range was also applied to remove genes with low variability. Genes with an interquartile range of less than 0.5 across all chips in the experiment were excluded, reducing the dataset further to 7,958 genes.Signal intensities were also examined at the probe level to identify single feature polymorphisms (SFPs), differences at the probe level due to genetic polymorphisms rather than expression differences, which may also impact computed expression values. Briefly, in R using previously described methods [], the RMA normalized expression estimate for each probe set was subtracted from background corrected and normalized expression levels at individual probes within the probe set. Normalized residuals were analyzed using significance analysis of microarrays (SAM) [] within the siggenes package to detect features with a significant effect for strain (FDR adjusted α < 0.01). A total of 3,199 genes with significant SFPs were then removed from the analysis.The Linear Models for Microarray Data (limma) package was then used to perform differential expression analysis on the filtered gene list using a linear model on log2 signal values with an empirical Bayes correction to the variance [,]. Comparisons of interest were extracted through contrasts, and P-values were corrected for multiple comparisons using the Benjamini and Hochberg method (FDR = 5 %) []. The data were initially analyzed with a fully crossed factorial model of Strain and Sex. However, because the Sex and interaction terms were not significant for the vast majority of genes (see Results), we also tested for differential expression among strains using a model that excluded sex. The effect of domestication history was tested using contrasts between wild and domesticated strains.Functional analyses of the resulting gene lists were performed using DAVID version 6.7 [,]. Functional categories were evaluated against the probability that they appeared in lists of differentially expressed genes at random based on their representation in the full list of genes that were classified as expressed in the zebrafish brain in our analysis. Functional classifications considered in the analysis included gene ontologies (GOs, []), COG ontologies, protein domains from PIR superfamilies, Interpro, and SMART databases, KEGG pathways, SP PIR keywords, and Up Seq features. Analysis of gene ontologies used the “FAT” option which filters the output, removing upper level GOs primarily from the top five levels of the hierarchy, in favor of more specific GOs that are often more informative. Categories were considered overrepresented if the EASE score was less than 0.05. We also report P-values adjusted for multiple tests using the Benjamini-Hochberg correction []. The Functional Annotation Clustering tool was used to identify categories with overlapping lists of differentially expressed genes. […]

Pipeline specifications

Software tools affyQCReport, affy, SAM, Siggenes, limma, DAVID
Databases GEO
Application Gene expression microarray analysis
Organisms Danio rerio, Homo sapiens, Oncorhynchus mykiss
Chemicals Acyl Coenzyme A, Iron, Synaptophysin