Dataset features


Application: RNA-seq analysis
Release date: Sep 30 2016
Last update date: Sep 30 2016
Access: Public
Taxon: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Macaca mulatta, Pan troglodytes, Mus musculus, Rattus norvegicus
Dataset link Initiation of mtDNA transcription is followed by pausing, and diverge across human cell types and during evolution

Experimental Protocol

We re-mapped GRO-seq and PRO-seq data. We calibrated the parameters required to identify sites of transcription initiation, termination and pausing, which was first assigned to human samples but was applied to diverse non-human organisms. Fastq files of the tested samples were trimmed by Trim-galore to reach a minimum read length of 30 nucleotides. The trimmed fastq files of the samples were mapped against the entire revised mitochondrial genome using BWA-aln (-q=5, -l=20, -k=2, -t=1). BWA was used to convert SAI into SAM format, which in turn was converted into a BAM file and sorted using Samtools. Samtools was used to generate VCF files of each sample (mpileup (-uf) command). Then, sample-specific mtDNA sequence was re-constructed for each of the analyzed samples using bcftools call (-c) (Samtools) in combination with vcf2fq from the package. The Fastq files were uniquely re-mapped against the reconstructed sample-specific mtDNA using BWA-aln (-q=5, -l=32, -k=2, -t=1), and BAM files were generated again. Removal of low MAPQ reads was performed using the Samtools ‘view’ command (-F=1804, -q=30) Coverage per base was calculated for a given sequence interval (separately for each strand) using Bedtolls (version 2.25), by the ‘genomecov’ command ('-d' and '-strand' options). Since the mtDNA is a circular molecule and some reads may have been erroneously excluded we re-analyzed the Fastq files. To this end we remapped the reads to the sample-specific mtDNA sequence that was rearranged such, that the last 500 nucleotides of the standard mtDNA sequence were cut and pasted at the beginning of the sequence. Mapping was performed and read coverage at the former circle junction of the rearranged sequence was calculated and added to the previous mapping results Genome build: Homo sapiens: NC_012920.1; Revised Cambridge Reference Sequence ("rCRS") Pan troglodytes: NC_001643.1 Macaca mulatta: NC_005943.1 Rattus norvegicus: NC_001665.2 Mus musculus: NC_005089.1 Drosophila melanogaster: NC_024511.2 Caenorhabditis elegans: NC_001328.1








Dan Mishmar