Computational protocol: An In silico approach for the evaluation of DNA barcodes

Similar protocols

Protocol publication

[…] An in silico PCR consists in selecting in a database the sequences that match (i.e., exhibit similarity with) two PCR primers. The regions matching the two primers should be localised on the selected sequence in a way allowing PCR amplification, which forces the relative orientation of the matches and the distance between them. In order to simulate real PCR conditions, the in silico PCR algorithm should allow some mismatches between the primers and the target sequences. Standard sequence similarity assessment programs such as BLAST [] are not suitable for such kind of analysis because the heuristic search they use is not efficient on short sequences. Moreover, a post processing of BLAST output should be performed to verify previously stated constraints. We have developed a program named ecoPCR that is based on the very efficient pattern matching algorithm Agrep []. This algorithm allows specifying the maximum count of mismatched positions between each primer and the target sequence, and to use the full IUPAC code (e.g., R for purines or Y for pyrimidines). It also allows specifying on which primer's specific positions mismatches are not tolerated, what is useful to force exact match on the 3' end of primers for simulating real PCR conditions. Moreover, to facilitate further analysis, ecoPCR output contains the taxonomic information for each sequence selected from the database. For the analyses presented in this article, we allowed two mismatches between each primer and the template, except on the last 3 bases of the 3' end of the primer. Analyses performed with 0, 1 or 3 mismatches led to similar conclusions (results not shown), even if the results were sometimes different (see discussion). This software was developed for Unix platforms and is freely available at http://www.grenoble.prabi.fr/trac/ecoPCR. [...] The resolution capacity of a barcode was estimated by an index measuring the ratio of unambiguously identified taxa for a given taxonomic level over the total number of tested taxa. A taxon unambiguously identified by a primer pair owns a barcode sequence associated to this pair that is not shared by any other taxa of the same taxonomic rank. To be computed, this definition can be formalized considering the mapping E, Img and E' between four concept sets: taxon (T), individual (I), barcode (B) and region (R) (for a full definition see figure ). Considering the a taxon t ∈ T and a primer pair (barcode region) r ∈ R and using the mapping E, Img and E' we define the Ω(t,r) set of all barcodes belonging to a taxon for a region: Ω ( t , r )   = I m g ( E ( t ) )   ∩ E ' ( r ) From the above description, we note the set of all individuals owning a barcode corresponding to a taxon as: I m g − 1 ( Ω ) ≡ ∪ i I m g − 1 ( b i / b i ∈ Ω ) This allows defining an unambiguously identified taxon t by a barcode region r if and only if: I m g − 1 ( Ω ( t , r ) )   = E ( t ) ) This defines a mapping ε of T to R and allows to define the specificity index Bs as: B s ( r ) = | { t / t ε   r } | | T | Bs can be computed from an ecoPCR output file using the ecoTaxSpecificity script. ecoTaxSpecificity and ecoTaxStat scripts are parts of the OBITools python package freely available at http://www.grenoble.prabi.fr/trac/OBITools.In a few cases, especially for Chondrichthyes, ecoPCR ran over the entire GenBank yielded only a small number of sequences. Thus, we calculated the resolution capacity of a barcode only when the primer pair amplified more than 10 species. […]

Pipeline specifications

Software tools ecoPCR, OBITools
Application qPCR