Computational protocol: Ancient mitogenomes of Phoenicians from Sardinia and Lebanon: A story of settlement, integration, and female mobility

Similar protocols

Protocol publication

[…] Ancient DNA sequences were also processed following the protocols described previously []. Specifically, ancient sequences were subjected to AdapterRemoval [], where sequencing adaptors, reads with stretches of Ns, bases with quality scores of <30 and short reads (<25 bp) were removed. Additionally, where paired end fragments overlapped by at least 11 base pairs, reads were merged. Merged and unmerged reads were aligned separately with Burrows-Wheeler Aligner (BWA) using recommended ancient DNA settings in which seeding was disabled (-l 1024), gap opens was set to 2 (-o 2) and maximum edit distance was set to 0.03 (-n0.03) []. To test for contamination with laboratory reagents, all reads were mapped to a composite reference genome constructed using the revised Cambridge Reference Sequence (rCRS) for humans [], along with the reference mitochondrial genomes for cow (Bos taurus), pig (Sus scrofa) and chicken (Gallus gallus) (GenBank refs NC_006853, NC_0012095.1 and NC_001323.1). The contamination ratio was determined by calculating the ratio of all reads with mapping quality > = 20 for each reference. We determined the contamination ratio by calculating the ratio of all reads with a mapping quality > = 20 for each reference, compared to those for the human CRS.PCR duplicates were marked and removed from unmerged reads using Picard tools 1.92 (, and from merged reads using a python script developed by Fu et al. []. MapDamage (v2.0.2–9) [] was used to identify characteristic aDNA damage patterns, with the ‘-rescale’ option to lower the quality score of likely damaged sites. Thiamines (T) found at the 5’end of a read and guanines (G) found at the 3’ ends within the first two bases were rescaled to 0 for their quality scores. Modern human contamination levels were assessed using the ContamMix package with a set of 311 representative modern human mtDNA sequences [].Variant call files (VCFs) were generated for all reads that mapped to the human reference genome using the GATK Haplotype Caller (v3.5) [] with settings specific for haploid genomes and were filtered for mapping quality (<20) and bases that were covered by fewer than 3 reads. We only analysed sequences having a minimum of 96% coverage. VCF files were converted to whole-sequence FASTA files using in-house commands and GATK’s FastaAlternateReferenceMaker. Coverage plots were created for all reads using R software [], and plots of fragment length were created from merged reads with Picard’s CollectInsertSizeMetrics (Picard tools 1.92, Consensus sequences were created (including indels) and these were deposited in GenBank. Sequences were assigned to haplogroup using Haplogrep [] with Phylotree build 17. All reads generated have been submitted to the NCBI Sequence Read Archive (SRP123440), identified by lab sample number. [...] Median-Joining network analysis was applied to all ancient sequences using POPART [] with default settings. Median-Joining networks were constructed by combining minimum-spanning trees within a single network.Discriminant analysis of principal components (DAPC) was used to investigate clustering of related individuals. DAPC is a multivariate analysis used to extract information of genetic relationships for between- and within-species/population sequence data []. The DAPC approach is integrated into the R package adegenet []. We used the fasta2DNAbin function provided by adegenet to load the sequence data into R. We then transformed the data to a genind file using DNAbin2genind (DAPC input format). The principal components (PC) and discriminant functions to retain were chosen using the plots created by the DAPC function (dapc). All R packages used for the cluster analysis can be freely downloaded at: investigate the phylogenetic relationships between the ancient mitogenomes that we obtained and modern population data, Maximum likelihood (ML) trees were constructed using PhyML []. Modern sequences for each haplogroup identified in our ancient samples were downloaded from Genbank. The most appropriate nucleotide substitution model was determined using jModeltest []. […]

Pipeline specifications