Computational protocol: Homoplastic microinversions and the avian tree of life

Similar protocols

Protocol publication

[…] We primarily used published data [-], although some novel CLTCL1 sequences were generated using the primers and PCR conditions from Kimball et al. [] (for details, see Additional file ). For this study, we focused on shorter sequences with extensive taxon sampling (Table ) instead of complete genomic sequences [-]. Sequences were aligned manually, sometimes starting from an alignment produced in an automated manner (i.e., using Clustal [] or MAFFT []). Alignments were refined iteratively with input from at least two different individuals. During this process alignments were examined carefully; this resulted in the identification of a number of microinversions "by eye" (Additional file , Table S2).Microinversions were also identified by a computational method that combined the multiple sequence alignments with the results of complementary strand alignments for all pairs of sequences (Additional file , Figure S1). The pairwise complementary strand alignments were generated using bl2seq [] and YASS [] and mapped onto the multiple sequence alignments using a program written by ELB. This program saved a table that included the first and last positions of each pairwise complementary strand alignment in the multiple sequence alignment and highlighted the overlapping pairwise complementary strand alignments (an example is presented in Additional File along with a description of the algorithm in pseudocode). Microinversions are expected to result in complementary strand alignments that either overlap or are located near each other in the sequence alignment. The presence or absence of microinversions at each position identified as a significant complementary strand hit involving sequences that were overlapping or located near each other in the multiple sequence alignment was then validated by visual inspection. Microinversion endpoints were assigned based upon the length of the complementary strand alignments, although there were some cases where inversion endpoints were difficult to identify (e.g., Figure ). Validating microinversions shorter than 5 bp was difficult, so that was the minimum size considered.The DNA mfold server (http://mfold.bioinfo.rpi.edu/cgi-bin/dna-form1.cgi; []) was used to search for stem-loop structures, and the MEME server (http://meme.sdsc.edu/meme4_4_0/intro.html) was used to search for sequence motifs that might be associated with inversions. [...] Phylogenetic analyses of the CLTC alignment, conducted to provide an estimate of the CLTC gene tree, used RAxML 7.0.4 []. Microinversions and sites with gaps and/or missing data in more than 50% of taxa were excluded before conducting the RAxML search. See Additional file for details. […]

Pipeline specifications