Computational protocol: The transcriptome of Candida albicans mitochondria and the evolution of organellar transcription units in yeasts

Similar protocols

Protocol publication

[…] RNA-seq libraries were prepared using the Ion Total RNA-Seq Kit v2 (Life Technologies) starting with 400–500 ng of either mitochondrial or total RNA, according to the manufacturer’s protocol. For mitochondrial RNA libraries the RNA fragmentation step was shortened to 4 min, preserving intact tRNAs. After the RNA fragmentation step each sample was divided into two equal aliquots, and one was treated with 0.5 U (mitochondrial RNA) or 0.75 U (total RNA) of tobacco acid pyrophospatase (TAP) (Epicentre Technologies) at 37 °C for 30 min. RNA quality and library construction was monitored using BioAnalyzer 2100 (Agilent Technologies) according to the manufacturer’s protocol.The libraries were sequenced on the Ion Torrent Proton™ NGS System on a P1 chip using the Template OT2 200 Kit for template preparation and Ion PI™ Sequencing 200 Kit for sequencing (Life Technologies), all according to the manufacturer’s instructions. This method produces single-end, strand-specific reads. Raw sequencing data were processed using the Torrent Suite™ Software (Life Technologies). Barcode removal and quality trimming were performed in Torrent Suite™ using default parameters (30 % QC threshold, reads <25 nt rejected). The resulting reads range from 25 to 351–367 nt, with a mean of about 70 nt. The processed reads were exported as FASTQ files and imported into CLC Genomics Workbench 8 ( for the mapping, analysis and visualization of sequencing results. The RNA-seq workflow used in this software is based on the methodology of Mortazavi et al. []. The complete mtDNA sequence of C. albicans strain SC5314 [GenBank:AF285261.1], with additional feature annotations from the Candida Genome Database (Candida Genome Database,, []) was used as the reference for read mapping. For analyses involving the nuclear genome, Assembly 22 of the C. albicans SC5314 genome sequence [] was used as reference. Reads were mapped to both strands the entire reference sequence, including intergenic regions using default parameters (mismatch cost 2, indel cost 3, length and similarity fractions 0.8). Following the removal of the second copy of the inverted repeat region from the reference sequence, only uniquely mapping reads were counted. Expression values for annotated genes were calculated as RPKM []. [...] The mtDNA sequences used in the comparative analysis, with accession numbers and references are listed in Additional file : Table S6. Concatenated amino acid sequences encoded by 14 mitochondrial protein coding genes were aligned with MUSCLE version 3.8.31 [] (3911 amino acid sites after manual removal of regions containing gaps in the alignment), and the tree was inferred using PhyloBayes (MPI version 1.5a) [, ] using the CAT-GTR model. Two MCMC chains were run in parallel for 5000 cycles, at which point the maximum discrepancy (maxdiff) value reached 0.009 (maxdiff <0.1 is considered sufficient). First 1000 trees in each chain were discarded as burnin, and one in two of the remaining trees were sampled for posterior consensus. The tree was rooted using the Y. lipolytica sequence as outgroup. […]

Pipeline specifications

Software tools CLC Genomics Workbench, CLC Assembly Cell, MUSCLE, PhyloBayes
Databases CGD
Applications Phylogenetics, RNA-seq analysis, Nucleotide sequence alignment
Organisms Candida albicans, Saccharomyces cerevisiae, Homo sapiens, Schizosaccharomyces pombe