Computational protocol: Genomic Mining of Phylogenetically Informative Nuclear Markers in Bark and Ambrosia Beetles

Similar protocols

Protocol publication

[…] We included 18 species of bark and ambrosia beetles and 8 additional weevils from other subfamilies for primer screening ( and ). These beetles were collected by one of the authors (BHJ) during fieldwork in tropical forests (1998–2012). Collection permits were requested from authorities in Uganda, Tanzania, Cameroun, South Africa and Madagascar. Ethical guidelines were followed. Voucher specimens are deposited in the Coleoptera collection of the University Museum of Bergen, University of Bergen, Norway. All weevils, Platypodinae and Scolytinae species used in this study were previously described in other phylogenetic studies [, , ].The procedure for primer selection can be summarized as follows: 1) putatively single copy expressed sequence tags (ESTs) longer than 800 base pairs were selected in GenBank for two different beetle species, Tribolium castaneum and Dendroctonus ponderosae; 2) preliminary BLAST searches were performed to discard unsuitable markers, based on the evidence for multiple paralogous copies (e.g. large gene families) or ambiguous genomic characterization (e.g. similar matching values for different proteins); 3) available sequences for each selected gene were aligned, including annotated genomic and transcriptomic sequences from model organisms (e.g. Drosophila melanogaster, Apis mellifera and Bombyx mori) to determine intron-exon structure; 4) degenerate primers were designed; 5) a PCR screening was run and products with the expected correct size (albeit highly variable due to presence of introns) were sequenced; 6) markers reaching a minimum PCR and sequencing success of 20% were used to reconstruct single gene phylogenies (Bayesian) and trees were compared to previously established and well-supported clades [, , , ].DNA was extracted from individual specimens using DNeasy Blood & Tissue kit (Qiagen) following the manufacturer’s instructions. The PCR reaction mixture contained 2.5 μl 10x PCR buffer (Qiagen), in which the final concentration of MgCl2 was 2.0 mM, 200 μM of each dNTP (Sigma Aldrich), 0.5 μM of each primer, 0.125 units Hot Start Taq® DNA polymerase (Qiagen), 2 μl DNA, with water added to a final volume of 25 μl. A negative control (sterile water) was included in each test. The PCR was performed using a S1000™ Thermal Cycler (BIO-RAD Laboratories, Inc.). Three standard cycle programs were used for the initial screening: denaturation step at 95°C for 5 minutes, 35 cycles of 30 seconds at 95°C, 30 seconds at 48, 52 and 58°C, 60 seconds at 72°C, and finally 5 minutes extension at 72°C. Further optimization included a gradient of annealing temperatures in the range of 44–62°C, modulating the extension time depending on the expected PCR product length, and MgCl2 concentration. We also considered two different touch-down PCR protocols for two of these genes (see for details).PCR products were sequenced with the same primers as those used for amplification. DNA sequences of both strands were obtained using the BigDye Terminator cycle sequencing ready reaction kit (Applied Biosystems Inc.) using an automated DNA sequencer (Applied Biosystems Prism 3700) following the manufacturer’s instructions.All obtained sequences were submitted to BLAST analyses, accepting a correct gene target if the cutoff value was below 1E-4. All sequences for each gene were aligned with other insect sequences for a preliminary NJ analysis in PAUP* 4.0 [] to detect deviant sequences. The sequences were checked by eye and using Bioedit 7.2.5 [] and MAFFT [] to align gene fragments with complex structure, caused either by to the presence of indels of coding triplets, or less frequently by long introns marked by unusual exon-intron borders such as the most common alternative splice site GC—AG [].Introns were trimmed and the coding fragments were translated into amino acid sequences using Bioedit 7.2.5 to check for translational errors (stop codons). All these preliminary analyses had the purpose of detecting pseudogenes or early signs of possible paralogs (e.g. high degree of amino acid substitutions). In addition, the amino acid sequences of the selected markers were examined in OrthoDB v9 to assess gene orthology [, ]. The orthology for each gene was confirmed by cluster of orthologous groups (COGs) comparison among arthropod sequences in the database. Ambiguous nucleotide positions in the coding region that were difficult to align were tentatively excluded (in Arr2 and Iap2) to create an alternative alignment for comparisons (see and ).Phylogenetic analyses were performed on unambiguously aligned sequences obtained from a minimum of 5 species. Phylogenetic inference was based on Bayesian and maximum parsimony analyses, the latter as implemented in PAUP* 4.0. Node support in the parsimony analyses was estimated by bootstrap analyses using 20 random additions of heuristic searches for each of 200 bootstrap replicates. Bayesian phylogenetic analyses were performed in MrBayes 3.2 []. The most appropriate model for base frequencies and substitution rates was determined by jModelTest [], using the Akaike information criterion (AIC). MrBayes searches were run for each gene separately and for concatenated datasets (8109 bp– 2702 aa) using the suggested models for each gene partition and a mixed model for amino acid substitution. In both cases, the search consisted of 2000000 generations with two independent runs, each with four simultaneous chains, and trees sampled every 1000 generations. The convergence diagnostics (SDSF, PSRF) and parameter sample plots were evaluated using the software Tracer 1.6 [].An indirect measure of the phylogenetic signal in each marker was assessed through topological congruence with previously well documented clades [–, , , ] which were used to derive a scheme of the current classification of Curculionoidea (). These clades belong to six tribes of Scolytinae (A = Dryocoetini including Xyleborini, B = Ipini, C = Hylurgini + Hylesinini, D = Scolytini) and the subfamily Platypodinae (E). Rooting of the trees was dependent on the sequences available, and used in the following order: 1) Brentidae, 2) Platypodinae, 3) Cossoninae, Molytinae and Lixinae, 4) Scolytini [, ].Basic properties of each gene, including the overall mean divergence of sequences (p-distance) and the variation in first, second and third positions, were calculated for each gene fragment using MEGA 6.0 []. Parsimony informative sites were calculated together with the homoplasy and retention indices (respectively HI and RI–) using PAUP* 4.0. A phylogenetic informativeness profile (PI) was obtained for each gene using PhyDesign [], an on-line program developed from a previous study []. Substitution rates for each position were calculated using HyPhy implemented in PhyDesign, selecting a K2P model (base frequencies = 0.25, transitions = 2, transversions = 1). The input time tree was obtained using Beast v1.8.2 [], with topology constraints following previously published phylogenies of weevils and Scolytinae [, , ]. The tree was reconstructed using a concatenated dataset of 16 genes, using a GTR+I+ Γ model for each gene partition, and a Yule speciation process. We selected an uncorrelated lognormal relaxed molecular clock and used default priors as suggested by the authors (see XML in Supplementary information). Two calibration points were used: 116 Ma for the node subtending Scolytinae and other weevil subfamilies, and 30 Ma for clade A (Dryocoetini+Xyleborini). […]

Pipeline specifications

Software tools BioEdit, MAFFT, MrBayes, jModelTest, MEGA, PhyDesign, HyPhy, BEAST
Application Phylogenetics