Computational protocol: Comparative genomics of non-pseudomonal bacterial species colonising paediatric cystic fibrosis patients

Similar protocols

Protocol publication

[…] Reads were trimmed using Trimmomatic v0.30 () and assembled using SOAP de novo v1.05 (). Contigs were ordered against reference genomes using CONTIGuator v2.7.3 () and gaps were closed where possible using IMAGE () within the PAGIT wrapper v1.64 (). Genomes were submitted to RAST for annotation (). Draft assemblies were compared with reference genomes using progressiveMauve (). Contamination and completeness of each draft assembly were estimated using CheckM v0.9.7 (). ANI was calculated using ANI calculator (http://enve-omics.ce.gatech.edu/ani/index, last accessed 2 June 2015).For isolates of the same species, each strain was also mapped to a reference genome selected based on the lowest percentage of unmapped reads. Mapping to reference genomes was performed using BWA 0.6.2-r126 () using default settings. Duplicate reads were marked using Picard 1.41 (http://www.picard.sourceforge.net) and base quality score recalibration and realignment around indels was performed using GATK 2.4-9-g532efad (). SNPs and indels were identified using GATK 2.4-9-g532efad following best practice guidelines (). Larger variation was detected using Breakdancer 1.1.2 () and CREST (). Mappings were visualised and SNP associated amino acid changes were determined using CLC Genomics Workbench (Qiagen, Velno, Netherlands). Unmapped reads were collected using SAMtools () and regions of zero coverage were identified using BEDTools (). [...] Antibiotic resistance and virulence related genes were identified using BLAST with the Comprehensive Antibiotic Research Database () and the Virulence Factor Database () in addition to examining RAST output and manual comparison with various annotated reference genomes. For comparison with the CARD, a cut off of ≥40% identity over ≥60% of the query sequence was required. For comparison with the VFDB, a cut off of over 60% coverage of the target and over 80% identity was required. BLAST was used to identify candidate instances of horizontal gene transfer using a cut off of ≥95% over 500 bp. Genomic islands and prophage regions were detected using Island Viewer () and PHAST (). Maximum likelihood “genome” trees were inferred using MEGA () with 500 bootstrap replicates based on concatenated alignments of 83 single copy marker genes from isolates and reference strains using CheckM (). […]

Pipeline specifications

Software tools Trimmomatic, CONTIGuator, PAGIT, RAST, Mauve, CheckM, ANI Calculator, BWA, Picard, GATK, BreakDancer, CLC Genomics Workbench, SAMtools, BEDTools, PHAST, MEGA
Applications De novo sequencing analysis, Nucleotide sequence alignment
Organisms Homo sapiens, Pseudomonas aeruginosa, Staphylococcus aureus, Achromobacter xylosoxidans, Stenotrophomonas maltophilia
Diseases Cystic Fibrosis, Genetic Diseases, Inborn