Computational protocol: The mitochondrial genome of Angiostrongylus mackerrasae as a basis for molecular, epidemiological and population genetic studies

Similar protocols

Protocol publication

[…] Short-insert libraries (100 bp) were constructed from the purified products and then sequenced using Mi-seq technology (Illumina platform; Yourgene, Taiwan). FastQC (Babraham Bioinformatics: www.bioinfomatics.babraham.ac.uk) was utilised to assess the quality of sequence data and the paired-end reads were filtered using Trimmomatic (http://www.usadellab.org/cms). De novo assembly of the sequences was performed using SPAdes 3.0.0 Genome Assembler (http://bioinf.spbau.ru/en/spades). The program was run for all odd k-mer sizes between 21 and 125 (inclusive). The k-mer size providing the largest scaffold was selected for further analysis.Following assembly, the mt genome of A. mackerrasae was annotated using a semi-automated bioinformatic pipeline []. Each protein coding mt gene was identified by local alignment comparison (performed in all six frames) using amino acid sequences from corresponding genes from mt genomes of A. vasorum, A. cantonensis and A. costaricensis; accession nos.NC_018602, GQ398121 and GQ398122, respectively [, ]. The large and small subunits (rrnL and rrnS) of mt ribosomal RNA genes were identified by local alignment, and all transfer RNA (tRNA) genes were predicted and annotated based on available data from selected nematode superfamilies, (the Metastrongyloidea, Trichostrongyloidea, Ancylostomatidea and Strongyloidea). Annotated sequence data were imported using the program SEQUIN (available viahttp://www.ncbi.nlm.nih.gov/Sequin/) for the final verification of the mt genome organisation and subsequent submission to the GenBank database. The amino acid sequences translated from individual genes of the mt genome of A. mackerrasae were then concatenated and aligned to sequences for 18 species for which mt genomic data sets were available using the program MUSCLE [].Phylogenetic analysis of amino acid sequence data was conducted by Bayesian inference (BI) using Monte Carlo Markov Chain analysis in the program MrBayes v.3.2.2 []. Bayesian analysis is more widely accepted and more accurate than the other methods due to the integration of Markov chain monte carlo algorithm. The optimal model of sequence evolution was assessed using a mixed amino acid substitution model, with four chains and 200,000 generations, sampling every 100th generation; the first 25 % of the generations sampled were removed from the analysis as burn-in. In addition, a sliding window analysis was performed on the aligned, complete mt genome sequences of the three Angiostrongylus species using the program DnaSP v.5 (http://www.ub.edu/dnasp/).A sliding window of 300 bp (steps of 10 bp) was used to estimate nucleotide diversity (π) over the entire alignment; indels were excluded using DnaSP. Nucleotide diversity for the entire alignments was plotted against midpoint positions of each window, and gene boundaries were defined. Pairwise analyses were also performed using amino acid sequences predicted from protein coding genes of the four Angiostrongylus species to identify regions of different magnitudes of amino acid diversity. […]

Pipeline specifications

Software tools FastQC, Trimmomatic, MUSCLE, MrBayes, DnaSP
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Angiostrongylus cantonensis, Caenorhabditis elegans, Rattus fuscipes, Homo sapiens, Pteropus alecto
Diseases Meningitis
Chemicals Amino Acids