Computational protocol: Complex signatures of genomic variation of two non-model marine species in a homogeneous environment

Similar protocols

Protocol publication

[…] The quality of raw reads from the MiSeq facility was first assessed with the FASTQC toolkit []. The reads were then trimmed with Trim Galore! (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), trimming adapter and overrepresented sequences, as well as sections with bases having a Phred quality score lower than 20. As optimizing k-mer lengths for RAD sequences produces the highest quality assemblies [] we conducted a de novo assembly with Spades v.3.5.0 [] testing multiple k-mer lengths, and determined optimal k-mer lengths of 81 for the Cape urchin and 91 for the Granular limpet. Assembly statistics, such as assembly length, longest contig, and N50 and L50 lengths were calculated with QUAST v4.1.1 [].As semi-global alignment and realignment of unmapped reads is recommended for pooled samples [], we used BWA-MEM [], following the same parameters as in Toonen et al. [], to map the filtered reads onto the de novo reference sequences. Mapping results (number of mapped versus unmapped reads) were calculated using the ‘stats.idx’ command in SAMtools v.1.3 []. The resulting SAM files were converted to BAM files with SAMtools, undergoing further filtering to discard all reads not mapped in a proper pair, reads not in a primary alignment and reads with a mapping quality score under 20. The BAM files were sorted and indexed, and then used to call variants with the ‘mpileup’ command in SAMtools, using a minimum quality score of 20 and maximum depth of 1000 reads per locus. […]

Pipeline specifications