Computational protocol: A Novel Interpretation of Structural Dot Plots of Genomes Derived from the Analysis of Two Strains of Neisseria meningitidis

Similar protocols

Protocol publication

[…] The genomes were compared using software MUMmer3 (MUMmer version 3.0) (http://mummer.sourceforge.net/). We used only the part concerned with the identification and visualization of MUMs, ignoring the clustering algorithm and the alignment generator. A maximal unique match was defined in Delcher et al. as “a subsequence that occurs exactly once in Genome A and once in Genome B, and is not contained in any longer such sequence”.The file AL157959.fna (Z2491) was used as the reference file and file AE002098.fna (MC58) as the query file. MUMmer3 was run with options: -b (compute both forward and reverse-complement matches); -c (report the query position of a reverse-complement match relative to the forward strand of the query); -mum (compute matches that are unique in both the reference and query files); and default minimum match length (20 bp).Following the MUMmer manual, we refer to “forward strands” and “reverse strands”, and to “forward matches” and “reverse-complement matches”. “Match regions” (forward and reverse-complement) were defined as visible regions of high-density MUMs. Precise boundaries of match regions are not relevant and were not defined. Match regions were identified from a dot plot of MUMs using a MUMmer utility based on GNUplot (http://www.gnuplot.info/).When a forward match region is located at the approximately same position in each genome and with unitary slope in a dot plot standardized to the origins of replication (oriC), it is referred to as a “collinear region”. When a match region appears to have been “deleted from one location (in an assumed common genome) and inserted elsewhere”, it would be referred to as a “transposition” in Delcher et al. . However, we prefer the words “translocation”/“translocated region”, following the definition of Deonier et al. of a translocation as a “placement of a chromosomal segment into a new sequence context elsewhere in the genome”.A “simple-insertion match region”/“polymorphic match region” is defined as an area along the collinear region or a translocated region that has a substantially lower density of MUMs in one/two dimensions. The former is visible in the dot plot as a gap in a region of high MUM density in one genome, which is not matched to a corresponding gap in the other genome. The latter is visible in the dot plot as a gap in a region of high MUM density in one genome, which is matched to a corresponding gap in the other genome (i.e., gaps with approximately the same length in each strain). Simple-insertion and polymorphic match regions were identified first by simple visual inspection and then by calculating the ranked sizes of the gaps between the adjacent sub-regions of high-density MUMs in each genome, and then zooming in on the dot plot to better identify the larger gaps. To keep the number of simple insertions and polymorphisms identified to a tractable set, only those with a gap greater than 5 kb are considered; and this gap size should be interpreted in light of the 2,000 kb genomes being studied. […]

Pipeline specifications

Software tools MUMmer, Gnuplot
Applications Miscellaneous, Nucleotide sequence alignment
Diseases Meningitis, Meningococcal Infections, Sprains and Strains