Computational protocol: Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism

Similar protocols

Protocol publication

[…] Two-year-old wild turkeys were obtained in the breeding season of their first reproductive year, after social dominance was established, from Vicvet Farms (Yorkshire, UK). Although the population is natural in that is has not been subject to selection for domestication traits, it is kept under controlled semi-natural conditions, allowing us to control for age, diet and many environmental influences that can potentially affect gene expression. All samples were collected under permission from institutional ethical review committees and in accordance with national guidelines. In each case, the telencephalon, spleen and left gonad were collected separately, homogenized and stored in RNAlater. RNA was prepared from the same volume of starting material with the Animal Tissue RNA Kit (Qiagen). Library and RNA-Sequence samples were prepared and barcoded by the Wellcome Trust Centre for Human Genetics, University of Oxford, using standard methods and sequenced on an Illumina HiSeq 2000 as paired-end 100 bp reads.The resulting data was assessed for quality using FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc). Trimmomatic was used to remove read pairs with residual adaptor sequence and conduct quality filtering. Reads were trimmed if the leading or trailing bases had a Phred score <4, and were also trimmed if a sliding window average Phred score over four bases was <15. Post filtering, reads where either pair was <25 bases in length were removed from subsequent analyses, leaving on average more than 26 million mappable paired-end reads per sample.The genome of Meleagris gallopavo version 2.01 (GCA_000146605.1), was obtained from Ensembl release 67 . Filtered reads were mapped to the genome (excluding rRNA regions) using RSEM, version 1.1.20 , which leverages the short-read aligner bowtie, version 0.12.8 . To remove non- and lowly-expressed genes, a minimum expression filter of four reads per million mappable reads was applied to the raw counts, as we have previously implemented for deep RNA-Seq datasets –. All genes expressed lower than this threshold in less than half the female, dominant male or subordinate male individuals were removed from further analysis to prevent our results being biased by the noise inherent in very lowly expressed genes. Fragments per kilobase per million mappable reads (FPKM), which corrects for variations in contig length and read depth between samples was calculated from these raw counts for each sample .To explore the expression differences among the three sexual phenotypes in the gonad, we calculated average log2 expression for all females, dominant males and subordinate males for each gene, and tested for sex-bias in several ways using the R package, DESeq , which calculates differential expression in a pairwise fashion by negative binomial modelling and adjusts for multiple testing using the Benjamini-Hochberg method. For the gonad, we first tested for sex-bias by identifying significant expression differences (>2-fold difference, p<0.05) between females and dominant males. However, in order to verify that our results were not artefacts of how we defined sex-bias, and regression toward the mean, we also identified those genes with significant expression differences between females and subordinate males, and between females and all males. Due to the reduced level of transcriptional dimorphism in the soma, we reduced our fold-change thresholds considerably for the spleen (Supplemental Materials).We performed hierarchical clustering using Euclidean distance with complete linkage, as implemented in Cluster 3.0 and visualized in TreeView (v.1.1.6) . Heat maps were separately constructed for male-biased, female-biased and unbiased autosomal genes and Z-linked genes. The reliabilities of the inferred trees were tested by bootstrap resampling (1000 replicates) using the R package, Pvclust .We separated autosomal and Z-linked genes for two reasons. First, the sex chromosomes in birds show incomplete dosage compensation , therefore they exhibit an overall male-bias due to gene dose effects. Additionally, the unbalanced sex-specific selection acting on the sex chromosomes has been shown in chicken to masculinize Z chromosome expression –. These patterns mean that although the Z chromosome is interesting in its own right, it cannot be directly compared in terms of sex-bias to the autosomes. Therefore, sex-bias for autosomal genes was defined as those genes expressed two-fold higher in dominant males or females, with an adjusted p-value<0.05 (unpaired t-test, Benjamini-Hochberg correction for multiple comparisons ). Unbiased genes were all those not classified as either male- or female-biased. When average log2 expression values for quartiles based on sex-bias were calculated, the fold change criteria was dropped so as to include genes with a lower fold change than 2. This prevented restriction of the quartile analysis to solely the most sex-biased genes but allowed comparison to genes differentially expressed between the sexes but sex-biased to a lesser degree.GO term enrichment analysis was performed by taking mouse Ensembl gene IDs for those genes with a 1∶1 mouse ortholog, identified via Biomart. The target list (i.e. 21 significantly differentially expressed genes between dominant and subordinate males, or genes shared between two morphs) were compared to a background list (either all expressed autosomal genes or all expressed genes) using Gorilla –. P-values were calculated using a hypergeometric model and corrected for multiple testing.In order to investigate dN and dS, the turkey genome was compared to the genomes of chicken (Gallus gallus) and zebra finch (Taeniopygia guttata), obtaining 16,496, 22,194 and 18,204 peptides and corresponding cDNA sequence for each species respectively from Ensembl. Proteinortho , with default parameters, was used to identify single copy orthologs held in all three species. These 7,854 orthologous groups were aligned with PRANK using a guide tree obtained from Superfamily 1.75 . This orthologous set was filtered with Repeatmasker (http://www.repeatmasker.org) to remove seven retrotransposons and perl scripts were used to remove two genes with in frame stop codons and 13 genes with less than 100 bp in aligned gapless length. PAML, version 4.4b , was used to analyse the remaining 6,839 one-to-one orthologs, utilizing the phylogeny used for PRANK above. Alignments where dS>2 were removed as this represents the point of mutational saturation in avian sequence data . For those alignments that passed filtering, the number of potential nonsynonymous substitutions (NdN), the number of nonsynonymous substitutions (N), the number of potential synonymous substitutions (SdS) and the number of synonymous substitutions (S) were extracted for each orthologous group for the turkey-specific branch of the three-species phylogeny. These values were summed for each expression category in order to calculate average dN and dS for male-biased, female-biased and unbiased genes. This has the advantage of simultaneously avoiding the problem of infinitely high dN/dS values for genes lacking synonymous substitutions while weighting the data by alignment length .The location of androgen transcription factor binding sites (tfbs) in the turkey genome were predicted using amniote androgen tfbs motifs . The predicted tfbs locations were then compared to the start sites of all turkey genes in 2 kb, 5 kb and 10 kb upstream windows and matching hits recorded. […]

Pipeline specifications