Computational protocol: Cell periphery-related proteins as major genomic targets behind the adaptive evolution of an industrial Saccharomyces cerevisiae strain to combined heat and hydrolysate stress

Similar protocols

Protocol publication

[…] The sequencing of ER and ISO12 was performed by Genewiz Inc. (South Plainfield, NJ, USA) using Illumina MiSeq 2x250bp (Illumina Inc., San Diego, CA, USA). The de novo assembly of ER and ISO12 and the reference mapping and variant calling of both strains relative to S288c were performed by Genewiz Inc. using the CLC Genomics Server (v6.5.1; CLC Bio, http://www.clcbio.com). The assemblies and raw reads were deposited online, see Availability of supporting data. The variant calling was performed with the following thresholds: Minimum coverage (read mapping) = 10×; Minimum count of a variant = 4; Minimum frequency (count/coverage) = 25.0 %. Sequence variants that did not satisfy these criteria were discarded. [...] In order to identify the variants that were unique to either ER or ISO12, all variants that were common to both strains were disregarded according to the reference mapping results. This filtering was provided by Genewiz Inc. To further predict the protein-level effects of the unique variants, the positions of the coding region SNPs and INDELs in both ER and ISO12 were reformatted from their mapped position in S288c ORFs to their corresponding chromosomal coordinates with a custom Perl (v5.10.1) script. The variants were then analysed for their non-synonymous effect on S288c ORFs using the Variant Annotation Integrator tool at the UCSC genome browser []. A list of the affected ORF of every detected non-synonymous variant in ISO12 was compiled and was analysed for enrichment in Gene Ontology (GO) terms by the YeastMine tool at SGD (accessed 2015-04-23). Complementary GO analyses were performed with the AmiGO 2 database [].The non-synonymous to synonymous substitution rate (Ka/Ks) [] was calculated between the ER and ISO12 strains. Every SNP was extracted from the variant calling dataset by a custom Perl script, and was then applied to the sequences of the corresponding S288c ORFs with VCFtools (v0.1.12) []. The Ka/Ks ratio was calculated on the modelled ER and ISO12 ORFs using the MA model of the KaKs_calculator (v1.2) []. The results were quality filtered to only regard values in the range of 0.01 < Ka/Ks <5. It should be stressed that the Ka/Ks model only accounts for polymorphisms and cannot accurately assess sequences that have changed in length during the evolutionary timeframe; therefore the INDELs were not considered for this analysis.Non-reference material in the de novo assembled contigs were detected with the ABACAS software (v1.3.1) [] coupled to the nucmer portion of MUMmer (v3.22) []. The unmapped contigs from both strains were aligned with each other with Blat (v34) [] in order to detect non-reference sequences that were present in both assemblies. The contigs were blasted to the NCBI database (http://blast.ncbi.nlm.nih.gov) to find regions with similarities to non-reference S. cerevisiae strains.Copy number variation was assessed for both the reference material and non-reference material. The ER and ISO12 reads were aligned to the reference genome and to the unmapped contigs with BWA (v0.7.12) [], were compressed and sorted with SAMtools (v1.1) [] and were finally analyzed for variation in copy number with CNV-seq (2014/08/12-version) [] and R (v3.1.0) []. For CNV-seq, the genome size parameter was set according to the size of the reference data. […]

Pipeline specifications