Computational protocol: Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

Similar protocols

Protocol publication

[…] DNA was extracted by the cetyl-trimethyl ammonium bromide (CTAB) method []. One microgram of DNA was used for sequencing. All samples underwent WGS with 76-base paired end fragment sizes, using Illumina paired end HiSeq2000 technology, and the raw sequence data is available in the European nucleotide archive (http://www.ebi.ac.uk/ena/data/view/PRJEB7798). For each sample trimmomatic software (http://www.usadellab.org/cms/?page=trimmomatic) was used to remove low quality reads and trim low-quality 3’ ends of reads. Nucleotide positions in the reads with a quality score lower than Q20 were removed. High quality reads were then mapped to the H37Rv reference genome (Genbank accession: AL123456.3) using BWA-MEM software (http://bio-bwa.sourceforge.net). SAMtools (http://samtools.sourceforge.net) and GATK (https://www.broadinstitute.org/gatk) were used to call single nucleotide polymorphisms (SNPs) and small indels. Variants (with quality at least Q30) were then selected as the intersection dataset between those obtained from both programs. Sample genotypes were called using the majority allele (minimum frequency 75%) in positions supported by at least 20-fold total coverage; otherwise they were classified as missing. Samples with a proportion of missing genotype calls greater than 15% were filtered out []. Similarly, we excluded positions in the genome with more than 15% missing genotypes across samples. Larger indels were called using a consensus from paired end mapping distance or split read approaches [,,,]. For more detail, the data processing pipeline has been described []. Spoligotypes were inferred from the sequencing reads SpolPred software []. Variation density maps were generated using Circos software (http://www.circos.com). [...] Phylogenetic analysis was performed in PHYLIP using a RAxML tree (Maximum likelihood phylogenetic software) with bootstrap to 1000 replicated []. Phylogenetic tree was visualized using Dendroscope software []. […]

Pipeline specifications

Software tools Trimmomatic, BWA, SAMtools, GATK, Circos, PHYLIP, RAxML, Dendroscope
Databases ENA
Applications Phylogenetics, WGS analysis, Genome data visualization
Organisms Mycobacterium tuberculosis
Diseases Tuberculosis
Chemicals Amikacin, Capreomycin, Ethambutol, Isoniazid, Kanamycin, Rifampin, Streptomycin