Computational protocol: Draft Genome Sequence of Yarrowia lipolytica Strain A 101 Isolated from Polluted Soil in Poland

[…] Yarrowia lipolytica is one of the most extensively studied oleaginous yeasts used as a cell factory, which shows great capacity to produce a number of biotechnologically important metabolites, such as organic acids, enzymes, polyols, fatty acids, and aromas (). Due to its ability to degrade organic compounds, including aliphatic and aromatic hydrocarbons, Y. lipolytica is also used in bioremediation and environment protection (, ).Up to now, the whole-genome sequences of three Y. lipolytica strains are available, E150/CLIB122 (), WSH-Z06 (BioProject PRJEB5051), and PO1f (), as well as a draft of the W29 genome sequence (). W29 and its derivative PO1f originate from France, whereas E150 derives from a cross between W29 and the American strain CBS6124-2. WSH-Z06 is a Chinese strain used for alpha-ketoglutaric acid production (). Here, we present the genome sequence of strain A-101 isolated from polluted soil at a car wash at Wrocław, Poland (). A-101 has been investigated for citrate biosynthesis from various substrates in different bioreactor systems (, ) and was used for soil bioremediation (, ). It was intensively mutated and genetically engineered to produce strains improved for citrate () and erythritol () biosynthesis.A-101 DNA was sequenced with the Illumina HiSeq DNA sequencing platform (paired-end [PE] 2 × 100 bp), with a shotgun library of 280-bp inserts and a mate-pair library of 7.2 kb on average. The raw reads were trimmed with Trimmomatic version 0.32 () and cutadapt version 1.8.3 (). The assembly was done using SOAPdenovo2 version 2.04 (), with a k-mer of 59, as estimated with kmergenie version 1.67 (). Two successive runs of GapCloser from the Short Oligonucleotide Analysis Package ( were used to close gaps, and manual curation was performed in overlapping regions. The current draft genome sequence is composed of 29 scaffolds larger than 5 kb (28 nuclear and a mitochondrial one), for a total size of nuclear DNA of 20,581,016 bp, with an N50 of 2,267,247 bp (4 scaffolds) and a G+C content of 49.03%. Coding sequence (CDS) prediction was essentially performed using the Amadea automatic annotation transfer software (, ), with E150 as the reference genome. A total of 6,576 putative CDS (including 15 alternative isoforms) were identified after manual curation. tRNA genes were determined using tRNAscan-SE version 1.4 () and 5S rRNA genes by similarity to E150. Transposable elements (TE) were manually annotated by similarity to yeast TE, including those of strain E150 (). In addition to copies of Tyl6, Ylli, and Mutyl, a relic of Fotyl, and a single long terminal repeat (LTR) of Ylt1, a new full-length TE of the Ty1/Copia superfamily, named Tyl5, was identified with LTR corresponding to the previously known LTRyl1 ().By mapping the reads to the E150 genome using BWA version 0.7.10 () and analyzing them using Samtools version 1.2 (), a total of 38,802 single nucleotide polymorphisms and 3,353 short indels were called in the A-101 genome sequence (nucleotide frequency of the reference (QS), <0.1; read depth (DP), ≥30), i.e., 2,048 nucleotide variations per Mb. Further comparison of this genome against other strains of Y. lipolytica and species of the Yarrowia clade will bring additional insights into gene functions and evolutionary events. […]

