Computational protocol: Whole-genome de novo sequencing reveals unique genes that contributed to the adaptive evolution of the Mikado pheasant

Similar protocols

Protocol publication

[…] The quality of the raw reads was examined using FastQC (FastQC, RRID:SCR_014583), version 0.10.1. Trimmomatic (Trimmomatic, RRID:SCR_011848), version 0.30 (parameters: “ILLUMINACLIP: TruSeq3-PE.fa:2:30:15 SLIDINGWINDOW:4:20 MINLEN:100”) [] and NextClip (version 1.3.1) [] with default parameters were used to trim sequencing reads. Genome assembly into contigs was performed using MaSuRCA (version 2.3.2) [] with settings based on the instruction manual. ALLPATHS-LG (ALLPATHS-LG, RRID:SCR_010742, version 49722) [], Newbler (version 2.9) [] both with default parameters, JR (version 1.0.4; parameters: “-minOverlap 60 -maxOverlap 90 -ratio 0.3”) [], SGA (version 0.10.13; parameters: “assemble -m 125 -d 0.4 -g 0.1 -r 10 -l 200”) [], and SOAPdenovo (version 2.04; parameters: “-K 47 -R”) [] were also used to assemble contigs. We employed SSPACE (SSPACE, RRID:SCR_005056, version 3.0; parameter: “-z 300”) [] to construct scaffolds for the draft genome. In this step, mate pair libraries with 35 bases from the 5' end of both reads were used for scaffolding. Scaffold sequences shorter than 300 bp were then excluded from the final assembly. The statistical results of the assembly were estimated using QUAST (version 3.2) [].To examine sequencing reads for potential contamination, we used Kraken (version 1.0) [] with the standard Kraken database to check the paired-end DNA libraries. Classified reads reported by Kraken were further examined using our proposed pipeline (). Briefly, we employed Bowtie 2 (Bowtie, RRID:SCR_005476; version 2.3.0) [] to align these classified reads against the chicken genome reference (Galgal 5.0) downloaded from Ensembl (release 90), collecting unmapped reads and using Bowtie 2 again to align them against the assembled genome of the Mikado pheasant. We then took those reads mapped onto the Mikado pheasant genome and performed Basic Local Alignment Search Tool N (BLASTN) alignment against the nonredundant nucleotide sequences database, downloaded from NCBI's FTP site (on Nov. 16, 2017), using parameters “-outfmt “6 std staxids” -max_target_seqs 1 -evalue 1E-10.” Next, we collected reads with alignment length ≥100 bp (i.e., two thirds of read length), filtering out the reads that matched an avian species or with a read count <50 in a species. The remaining reads were counted and the contaminated scaffolds calculated by applying a cutoff of a read count >20 on a given scaffold. Finally, we removed 31 contaminated scaffolds with 12,587 bp (∼0.001% of the total length) from the assembled genome. […]

Pipeline specifications

Software tools FastQC, Trimmomatic, NextClip, MaSuRCA, ALLPATHS-LG, Newbler, SOAPdenovo, SSPACE, QUAST, Kraken, Bowtie, BLASTN
Application De novo sequencing analysis
Organisms Gallus gallus
Chemicals Oxygen