Computational protocol: Comparative genomic analysis reveals distinct genotypic features of the emerging pathogen Haemophilus influenzae type f

Similar protocols

Protocol publication

[…] Full genome alignment and comparison of genomic rearrangement patterns between the Hif KR494 and reference strains (Table  ) was performed using the following programs: Mummer program [], Artemis Comparison Tool (ACT) [] or mVISTA [] with BLASTn setting at a minimum identity of 95% and an expected threshold = 1e-5 unless otherwise indicated. For initial genome sequence pairwise alignment, a default setting value of 70% was used as the minimum percent conservation identity that must be maintained over the window size 11 for a region (> 50 bp) to be considered conserved. Thereafter, from the total identified conserved genomic blocks, a minimum of 95% of sequence identity was set to identify highly conserved regions (>50 bp). Genomic comparative maps were visualized using ACT, whereas Artemis [] was used for data management. GenBank accession numbers of genomes used in the present study are listed in Table  . [...] To find unique and common genes in the Hif KR494 and reference strains/species, we performed extensive comparative analyses of open reading frames (ORFs) from whole genome sequences. We used the Mummer program in these analyses at window size 11. Briefly, total ORFs from KR494 and a selected reference genome, or of a reference genome pair, were analyzed with tBlastx at the setting of cutoff e-value ≤ 1e-5 and protein sequence similarity ≥85%. Finally, proteins with the best hits value from reciprocal blast were initially collected and grouped as (i) common CDSs shared between genomes, and (ii) CDSs unique to each genome. Results were formatted in Blast m8 tabular form. Thereafter, we used Perl scripts to further retrieve the accessory genome of Hif KR494 using different parameters, that is, genes absent from (i) all H. influenzae reference genomes used in the present study (Table  ), (ii) related Haemophilis spp. reference genomes (Table  ) or (iii) genes not found in all H. influenzae genome sequences available in the current databases. A similar approach was used to obtain common CDSs using the same parameters as outlined above. DNA plotter [] and ACT were used for visualization of genomic features. In the present study, protein sequence homology (over the complete protein length) between Hif KR494 and reference strains was presented in percentage of similarity and identity. […]

Pipeline specifications

Software tools MUMmer, ACT, mVISTA, BLASTN, TBLASTX
Applications WGS analysis, Nucleotide sequence alignment
Organisms Haemophilus influenzae, Haemophilus parainfluenzae
Diseases Haemophilus Infections
Chemicals Iron, Kanamycin