Similar protocols

To access compelling stats and trends, optimize your time and resources and pinpoint new correlations, you will need to subscribe to our premium service.


Pipeline publication

[…] BluePippin quantitative electrophoresis unit (Sage Science, Beverly, MA, USA). Single end, 100 base reads were generated on one lane of an Illumina HiSeq 2500 at the University of Texas Genomic Sequencing and Analysis Facility (UTGSAF, Austin, TX, USA)., The 100 bp reads generated on the HiSeq run were first filtered to remove contaminant DNA (e.g. Escherichia coli; PhiX) and low quality reads. A perl script was then used to identify individual barcodes, correct barcodes with errors, and remove reads containing sequences associated with Illumina adaptors or PCR primers. After this step, fragments were 86–88 bases in length. A random subset of 25 million reads was assembled de novo using the SeqMan ngen software (DNASTAR Inc.), specifying a minimum match percentage of 95 and a gap penalty of 30 (full details of parameter settings are available from the authors by request). Contigs were removed from the reference if they contained fewer than 10 reads, were over-assembled, or were not 84–90 bp in length. This step produced a reference of genomic regions sampled with our GBS approach, providing a template for subsequent reference guided assembly. DNA sequences from each chickadee were subsequently aligned to the reference with bwa v7.5 []using the aln and samse algorithms and an edit distance of 4. Because all sampled genomic regions begin with the EcoRI cut site and all HiSeq reads contained 100 bases of sequence, these alignments produced consistently rectangular contigs with even positional coverage., Variant sites (i.e. SNPs) were called and quantified using samtools v.0.1.19 and bcftools v.0.1.19 [,]. SNPs were considered if at least 90% of individual birds had at least one read at the position, the site was biallelic, and the minor allele frequency was greater than 5%. For reference contigs containing multiple SNPs, a single SNP was randomly selected to increase independence of SNPs and to decrease the effect of linkage disequilibrium on subsequent analyses. For each bird, genotype likelihoods were calculated for each SNP using bcftools. Genotype likelihoods were initially stored in Variant Call Format (.vcf) and then converted to a composite genotype likelihood format. Genotype likelihood matrices and assembly related files are available at Dryad and additional information regarding parameter settings is available from the authors upon request., To account for uncertainty associated with variation in coverage depth, a hierarchical Bayesian model [] was employed to estimate genotype probabilities based on the genotype likelihoods estimated above. This model treats population allele frequencies as priors and simultaneously estimates both allele frequencies and genotype probabilities after accounting for variation in […]

Pipeline specifications

Software tools DNASTAR Genomics Suite, BWA, SAMtools, bcftools
Organisms Ilex paraguariensis
Chemicals Nucleotides