Computational protocol: Evolution and Phylogeny of Large DNA Viruses, Mimiviridae and Phycodnaviridae Including Newly Characterized Heterosigma akashiwo Virus

Similar protocols

Protocol publication

[…] Viral genomes and encoded amino acid sequences were downloaded from The NCVOG database was downloaded from acid sequences coded by genomes of following viruses were incorporated into the NCLDV database used in this study; HaV53, Aureococcus anophagefferens virus MM 2014 (AaV, NC024697), Acanthocystis turfacea Chlorella virus 1, (AtCV, NC008724), Bathycoccus RCC1105 virus (BpV, NC014765), Haptolina ericina virus CeV-01B (formerly Chrysochromulina ericina virus, CeV, KT820662), Cafeteria roenbergensis virus (CroV, NC014637), Emiliania huxleyi virus86 (EhV, NC007346), Ectocarpus siliculosus virus1 (EsV, NC002687), Feldmannia species virus (FsV, NC011183), Megavirus chilensis (MegaV, NC016072), Acanthamoeba polyphaga mimivirus (MimiV, NC014649), Acanthamoeba polyphaga moumouvirus (MoumouV, NC020104.1), Micromonas RCC1109 virus (MpV, NC014767), Ostreococcus lucimarinus virus 5 (OlV5, NC020852), Ostreococcus tauri virus 1, (OtV1, NC013288), Ostreococcus tauri virus 5 (OtV5, NC010191), Paramecium bursaria Chlorella virus 1 (PBCV1, NC000852), Phaeocystis globosa virus (PgV, NC021312), Autographa californica nucleopolyhedrovirus (AcNPV, NC 001623), African swine fever virus (ASFV, NC 001659), Melanoplus sanguinipes entomopoxvirus (MsEV, NC 001993), Amsacta moorei entomopoxvirus (AEPV, NC 002520), Culex nigripalpus NPV (CnNPV, NC 003084), Heliothis virescens ascovirus 3e (HvaV, NC_009233), Infectious spleen and kidney necrosis virus (ISKNV, NC 003494), Mamestra configurata NPV-A (McNPV, NC 003529), Lymphocystis disease virus china (LDV, NC 005902), Spodoptera litura granulovirus (SlGV, NC 009503), Marseillevirus (MarV, NC 013756), Rodent herpesvirus Peru (RHV, NC 015049), Lausannevirus (LausV, NC 015326), Wiseana iridescent virus (WiV, NC 015780), Pithovirus sibericum (PithoV, NC 023423), Pandravirus dulces (PandraV, NC_021858), Mollivirus (MolliV, NC 027867), and Human herpesvirus 3 (HHV3, NC 001348).For NCLDV CP (NCVOG0022), D5-like helicase primase (NCVOG0023), and DNA polymerase B (NCVOG0038) phylogenetic analyses (Supplementary Figure ), the orthologs were determined by choosing best-hit target sequences obtained by BLASTP search (E-value < 10-20) using the NCVOG orthologs as queries, and the databases were created from the amino acid sequences coded by the genomes of above mentioned viruses and of Ambystoma tigrinum virus (AtV, NC_005832), Bovine papular stomatitis virus (BpsV, NC_005337), Fowlpox virus (FpV, NC_002188), Frog virus (FrogV, NC_005946), Invertebrate iridescent virus 6 (InvIV6, NC_003038), Lymphocystis disease virus 1 (LDV1, NC_001824), Molluscum contagiosum virus subtype 1 (McV, NC_001731), Sheeppox virus (ShpV, NC_004002), Singapore grouper iridovirus (SgiV, NC_006549), Swinepox virus (SwpV, NC_003389), Trichoplusia ni ascovirus (TnaV, NC_008518), Vaccina virus (VaccinaV, NC_006998), and Yaba monkey tumor virus (YmtV, NC_005179). [...] BLAST+ (version 2.2.31) executables were downloaded from Databases for BLASTP and PSI-BLAST searches were constructed according to the provided manual. For phyletic studies, 18 viruses, namely AaV, AtCV, BpV, CeV, CroV, EhV86, EsV, FsV, HaV53, MegaV, MimiV, MoumouV, MpV1, OlV5, OtV1, OtV5, PBCV1, and PgV were selected. Proteins equal to or larger than 100 aa encoded by each virus were extracted and used as queries. When a single open reading frames (ORF) hit multiple target sequences in databases, the hit with the highest bit score was selected for further study. Similarly, when multiple ORFs in a viral genome hit the same target sequence in NCVOG, the ORF that hit with the highest bit score was selected for further study to identify a true ortholog rather than paralogs.Multiple sequence alignments and phylogenetic reconstructions by neighbor-joining were performed in ClustalX version 2.1 (). Poorly conserved regions and positions including gaps were removed prior to phylogenetic analysis. Neighbor-joining phylogenetic inferences were conducted, and the confidence of the branching was assessed using 1,000 bootstrap resampling replicates of the analyzed dataset.Pan-genome analysis was conducted using PGAP software using cut-off values of 20% identity and E-value < 10-5 (). In this analysis, orthologs in each virus in the dataset were determined by all-to-all BLASTP search followed by MCL, and phyletic inference calculated by neighbor-joining based on the presence/absence matrix of the orthologs in each combination of the viruses ().Gain and loss of gene families during evolution was mapped on a guide tree based on the concatenated sequence of nine preserved genes (Figure ) using COUNT software (; ). For each gene family, Wagner parsimony with gene gain penalties of 1 and 5 were used to infer the most parsimonious ancestral gene sets with different gain/loss pressures. We chose Wagner parsimony, rather than other protocols, because it allows multiple gains with penalties and infers gene family expansion and contraction (). For both PGAP and COUNT analyses, we selected genes coding for proteins with 100 aa or more. The resulting trees from all the analyses were visualized using Geneious 9.0.5. […]

Pipeline specifications

Software tools BLASTP, Clustal W, PGAP, Geneious
Applications Genome annotation, Phylogenetics
Organisms Heterosigma akashiwo