Computational protocol: Polymorphisms in the yeast galactose sensor underlie a natural continuum of nutrient-decision phenotypes

Similar protocols

Protocol publication

[…] To perform linkage analysis, we aligned raw reads for parent strains (from the literature) and segregant pools (from our experiments) to the sacCer3 (S288C) reference genome using BWA-MEM on the Harvard Medical School Orchestra cluster (http://rc.hms.harvard.edu, see Orchestra High Performance Compute Cluster note below). We identified SNPs between cross parents and determined allele counts at each SNP in segregant pools using samtools mpileup and bcftools call -c. Using custom MATLAB code, we removed SNPs where read depth was less than 2 or higher than 1000 to avoid alignment artifacts. After filtering, average sequencing depth per pool ranged from 25x to 71x, with a median of 48x.To calculate LOD scores for allele frequency differences between OFF and ON pools, we input filtered allele counts to the mp_inference.py script (MULTIPOOL Version 0.10.2; []) with the options -m contrast -r 100 -c 2200 -n 1000, following previous practice []. A value of n = 1000 likely underestimates our segregant pool size and will lead to conservative LOD estimates. An exception to this is the I14xYJM421 cross, which displayed unusually low spore viability (~20%), possibly due to a Dobzhansky-Muller incompatibility []. Thus we used n = 200 for this cross.We defined significant loci as LOD peaks where LOD > 10 (). Previous bulk segregant analyses using MULTIPOOL used a less stringent cutoff of LOD > 5 [,]. This corresponded to a false discovery rate of 5% in one study [], but led to a much higher number of unreplicated locus calls in another study []. Given that our segregant pools underwent multiple rounds of meiosis (and potentially diversity-reducing selection), we chose to use the more conservative LOD > 10. The choice of LOD does not affect our main conclusions about GAL3; even the lowest LOD for the chrIV:460 locus (in YJM978 x Y12) is 24 and thus highly significant (). Besides this locus, other moderately significant loci may still be biologically relevant, and so we provide a list of LOD peaks and their corresponding support intervals at LOD > 5 (). We clustered these peaks as a single locus if they occur within 20kb of each other from different crosses (, ). [...] Sequences for the SGRP strains were downloaded from SGRP website. Sequences for the strains in the Liti library [] were downloaded from https://www.sanger.ac.uk/research/projects/genomeinformatics/sgrp.html. For the remaining strains with multiple distinct isolates reporter in the literature, a single genetic distance that matched the strain in our collection was selected. Using these sequencing databases, we extracted the GAL3 region and aligned sequences using MUSCLE (, ). Based on the identified SNPs, we used mutfunc (http://mutfunc.com/) to predict the consequences of nonsynonymous SNPs in the GAL3 variants (). These sequences were used for the McDonald Kreitman analysis using DnaSP [] (). A neighbor-joining phylogenetic tree was generated using the seqneighjoin function on MATLAB () and genetic distances []. […]

Pipeline specifications

Software tools BWA, SAMtools, bcftools, MULTIPOOL, mutfunc, DnaSP
Applications Phylogenetics, Pool-seq analysis
Organisms Saccharomyces cerevisiae
Diseases Galactosemias
Chemicals Galactose, Glucose