Computational protocol: Determination of a Screening Metric for High Diversity DNA Libraries

Similar protocols

Protocol publication

[…] We developed a proprietary method for read mapping called the “graphaligner”, and compared it with two standard methods: (1) an implementation of Smith-Waterman alignment[] and (2) Bowtie 2, an aligner based on the Burrows-Wheeler transform[].The graphaligner builds a finite state machine (FSM) for each library reference []. Because each variant library is a collection of very similar sequences, all of the sequences that comprise the library can be expressed effectively in a compressed reference format as a regular expression. The output regular expression has no loops and alternatives are of fixed length, simplifying its interpretation. The alignment of a read to the reference uses the states of the resulting FSM for graphaligner alignment instead of the reference base positions used for Smith-Waterman alignment.The graphaligner can process alignments as local or global alignments based on the initialization and usage of the dynamic programming matrix. As with the Gotoh modification to Smith-Waterman alignment, the graphaligner allows gaps in either the reference or the read with affine gap penalties[, ]. Tracking of the best traversal through the FSM is complex as each state may have multiple predecessors; the maximum score from any predecessor is maintained in the dynamic programming matrix. As with other alignment methods such as Smith-Waterman, the graphaligner guarantees that there is no better score than the alignment it found, but it does not guarantee that the presented alignment is the sole alignment with the maximum score. shows a graphical representation of how the graphaligner functions. The state model for the graphaligner allows for variant positions where the model can be traversed through one option or another, e.g. model states 1–2 compared to states 3–4 (). This method allows for each state in the matrix to have multiple predecessors and therefore there are multiple paths through the FSM that are considered before a determination of a match between a read and a particular instance of the state model is made.To evaluate the graphaligner performance, we implemented a Smith-Waterman routine with flexible scoring parameters and affine gap penalties. We also implemented Bowtie version 2.2.5 with local alignment and affine gap penalties; requiring that paired reads map to the same reference. Errors that are present in both reads of a paired-end read are considered to be "confirmed" and assigned to the variant library. Errors that are present in only one of the reads of a paired-end read are "unconfirmed" and assigned to the sequencer as a sequencing error. The overall sequencing error rate is within the normal ranges for the Illumina® MiSeq instrument. […]

Pipeline specifications

Software tools GraphAligner, Bowtie
Application Nucleotide sequence alignment