Computational protocol: Optimized double-digest genotyping by sequencing (ddGBS) method with high-density SNP markers and high genotyping accuracy for chickens

Similar protocols

Protocol publication

[…] All sequencing experiments were performed on the Illumina Nextseq500 Sequencer at the State Key Laboratory for Agro-biotechnology, China Agricultural University. BCL files as primary sequencing output were converted into FASTQ files using bcl2fastq2 conversion software (version 2.16.0). During the conversion step, we also masked and trimmed the sequencing adapter []. After the trimming step, the Illumina 91-bp single-end reads were subjected to a filtering process: at first, the reads that were polluted by the adapter sequence were deleted, and then the reads which contained more than 50% low quality bases or more than 5% N bases were removed. The quality control check report of filtered reads was generated by FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). We used TASSEL GBS analysis pipeline (version 4.0) [,], in which reads were aligned to the chicken reference genome Gallus_gallus-4.0 (released 2011) using Bowtie2 []. All SNP filter options in TASSEL were "-c 3", the minimum number of times a tag must be present to be output; "-mnTCov 0.01", the minimum SNP call rate for a taxon to be included in the output; "-mnSCov 0.6", the minimum sample call rate for a SNP to be included in the output; and "-mnMAF 0.05", the minimum minor allele frequency. The raw SNP sites were filtered by VCFtools [] according to the following parameters: 1) minor allele frequency (MAF) > 5%; 2) genotypes with a quality above 98 (GQ ≥ 98) and depth ≥ 5; 3) and only biallelic markers were retained. Ungenotyped markers were imputed using Beagle4.0 software [] with the pedigree file of F8-F9 family relationships. To annotate mutations from the GBS output, we used the SNPEff program [], with the chicken reference genome sequence and GTF annotation files downloaded from Ensembl (http://www.ensembl.org/info/data/ftp/index.html). The Circos software package (http://circos.ca/) [] was utilized to visualize the distribution of fragments, GC islands, repeat regions, and SNPs in the chicken genome. The genome-wide LD pattern assessment was implemented using a squared allelic correlation coefficient (r2) against the distance between the SNPs. To visualize the LD pattern, the r2 values were plotted against the pair-wise SNP distances. […]

Pipeline specifications