Computational protocol: DNA barcoding as a complementary tool for conservation and valorisation of forest resources

Similar protocols

Protocol publication

[…] Sequences were aligned with MEGA5 () and checked by eye. Haplotypes were defined with BLASTClust v2.2.20 (http://toolkit.tuebingen.mpg.de/blastclust) with the following command line: blastclust -i infile -o outfile -p F -L1 -bT -S100, thus requiring to cluster together only sequences with 100% identity and length coverage. All the species presenting single haplotypes were considered efficiently discriminated; those displaying at least one haplotype in common with another species were considered precluded to discrimination.Species discrimination power of the investigated loci was also assessed using the genetic distance approach, to evaluate whether the amount of variation displayed was sufficient to discriminate sister species without affecting their correct assignation through intraspecific variation. This approach is at the basis of the “barcoding gap” definition, i.e. the assumption that the amount of sequence divergence within species is smaller than that between species. Uncorrected p-distance matrices of sequence divergences within and among congeneric species were calculated for each gene fragment and for the two joined markers (rbcL + trnH-psbA), with MEGA5. All the species presenting a minimum interspecific distance value higher than their maximum intraspecific distance were considered successfully discriminated ().Finally, we simulated a barcode identification scenario using each sequence as an unknown query and GenBank (http://www.ncbi.nlm.nih.gov) as global reference database. The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) was screened to assess the presence of the investigated species set in GenBank, relatively to markers under study. The identification ability of every single marker was evaluated using the megaBLAST algorithm (http://blast.ncbi.nlm.nih.gov) with default parameters and adjusted to retrieve 5000 sequences. A query sequence was considered as successfully identified if the top Bit-score obtained in GenBank matched the name of the species (). Identification success was only inferred for species/sequences represented in GenBank. When more than one species shared a top Bit-Score or the species scored lower, the result was considered an identification failure. […]

Pipeline specifications

Software tools MEGA, BLASTclust, BLASTN
Databases NCBI Taxonomy Database
Application Protein sequence analysis
Organisms Homo sapiens