Computational protocol: Novel insights into mitochondrial gene rearrangement in thrips (Insecta: Thysanoptera) from the grass thrips, Anaphothrips obscurus

[…] The two long PCR amplicons, 7,409 bp and 7,188 bp in size, were obtained at different time and were sequenced separately. The 7,409-bp amplicon was sequenced with Illumina Hiseq 2000 platform at the BGI, Hong Kong. The 7,188-bp amplicon was sequenced with Illumina Hiseq 2500 platform at the Berry Genomics, Beijing. Illumina sequence-reads obtained from the long PCR amplicons were checked for quality and then assembled into contigs with Geneious 6.0.6; the assembly parameters were: minimum overlap identity 98%; no gaps; maximum mismatches per read 2%; maximum ambiguity 2; and minimum overlap 100 bp.We identified tRNA genes using tRNAscan-SE 1.2.1 and ARWEN. A few tRNA genes that could not be identified by these programs were found by manual inspection for predicted anti-codon sequences and secondary structure found in other thrips. Protein-coding and rRNA genes were identified by BLAST searches of GenBank. The annotated mt genome sequence of A. obscurus has been deposited in GenBank under accession numbers KY498001. [...] We inferred the phylogenetic relationship of A. obscurus with four other species of thrips whose mt genomes have been reported (see Supplementary Table ). The damsel bug, Alloeorhychus bakeri , which retained the ancestral mt genome organization of insects, was used as the outgroup. Each protein-coding gene was aligned individually by codons using MAFFT algorithm implemented in TranslatorX with L-INS-i strategy and default settings. Poorly aligned sites were removed from the amino acid alignment before translating back to nucleotides using GBlocks in TranslatorX with default settings. The rRNA genes were individually aligned using MAFFT 7.0 online server with G-INS-i strategy. Ambiguous positions in the rRNA gene alignments were filtered using GBlocks v0.91 with default settings. Alignments of individual genes were concatenated as two datasets: 1) PCGR dataset, containing all three codon positions of 13 protein-coding genes, and two rRNA genes (11,767 bp in total); and 2) PCG12R dataset, which is the same as the PCGR dataset except the third codon positions of protein-coding genes are excluded (8,646 bp in total).The two concatenated datasets were analyzed using maximum likelihood (ML) method implemented in RAxML-HPC2 8.1.11, and Bayesian inference (BI) method implemented in MrBayes 3.2.3. Multiparametric bootstrapping analysis of 1,000 replicates was performed in RAxML based on the optimal tree with the best likelihood score and the GTRGAMMA model. For MrBayes analyses, two simultaneous runs of 10 million generations were conducted for the dataset using GTR + I + G model and trees were sampled every 1,000 generations, with the first 25% discarded as burn-in. Stationarity was considered to be reached when the average standard deviation of split frequencies was below 0.01. […]

