Computational protocol: The Most Developmentally Truncated Fishes Show Extensive Hox Gene Loss and Miniaturized Genomes

Similar protocols

Protocol publication

[…] To allow the estimation of divergence times between the two Paedocypris species and six other otophysan taxa for which genomic resources were available (D. rerio, A. mexicanus, C. carpio, Pi. promelas, S. grahami, and L. waleckii), we followed the pipeline for phylogenetic marker selection presented in , ) with few modifications. These modifications included the following changes: Marker selection began with a set of 3,238 cave fish (A. mexicanus) exons, for which at least five orthologs were known among the seven species D. rerio, Gadus morhua, Gasterosteus aculeatus, Oreochromis niloticus, Oryzias latipes, Poecilia formosa, and T. rubripes, according to version 87 of the Ensembl database. This marker set was then used to identify potential orthologs from the genomes of the eight selected otophysan taxa based on tblastn searches followed by a strict filtering procedure. Compared with , we applied a lower dN/dS threshold of 0.25 to exclude markers potentially affected by positive selection, and we removed all markers for which no homologs could be detected in one or more of the eight otophysan genomes. We also applied stricter thresholds on clock-like evolution of candidate markers, so that all genes with an estimated coefficient of rate variation >0.8 as well as those with a mean mutation rate >0.0004 per site per million years were excluded. We identified 138 genes with a total alignment length of 135,286 bp which were subsequently used for analysis with BEAST 2. Since the topology of otophysan taxa has previously been resolved with a more comprehensive phylogenetic data set (), we here focused on the inference of divergence times only, by using the topology inferred by as a starting tree and excluding all of BEAST 2’s operators on the tree topology. Divergence times were estimated by calibrating the most recent common ancestor of Cypriniformes and Characiformes with a lognormal distribution centered at 121 Ma (standard deviation on log scale: 0.1) according to the results of . We performed two replicate BEAST 2 analyses with 800 million MCMC iterations, of which the first 100 million were discarded as burnin. Convergence was assessed based on similarity of parameter traces between run replicates and effective sample sizes (ESS) >200. A maximum clade credibility (MCC) summary tree with node heights according to mean age estimates was produced with TreeAnnotator (v. 2.1.2) (). [...] To determine the directionality of changes in repeat content and genome sizes, the time-calibrated phylogeny () was used to reconstruct ancestral states at each internal node. This was done with the ML algorithm “phylopars” implemented in the R package Rphylopars (), assuming a Brownian Motion model of trait evolution. Ancestral state reconstruction was performed separately for numbers of DNA transposons, LINEs, SINEs, LTRs, other repetitive elements, and for genome size (additional file 2: , online). [...] We investigated the presence of PIWI-like genes in the genomes of the two Paedocypris species and the other cyprinids using Exonerate with the longest transcripts available for the two PIWI-like homologs from zebrafish; PIWIl1 (ENSDARG00000041699) and PIWIl2 (ENSDARG00000062601). Regions containing sequences spanning more than three introns were aligned to the zebrafish exons using mafft () as implemented in AliView () (v. 1.17.1). Intron sequences were aligned manually based on the established exon structure, using the full-length scaffold sequences. Local gene synteny to zebrafish chromosome 8, surrounding the putative PIWIl1 copies, was confirmed through reciprocal BLAST searches using both the MAKER2 annotated proteins and proteins predicted by GeneScan (online version) () as queries. [...] As P. carbunculus has been shown to have a haploid chromosome count of 15 (), potential chromosomal fusions were investigated on the basis of disrupted synteny of zebrafish chromosomes in relation to Paedocypris.We identified putative homologous regions between the zebrafish genome assembly and each of the Paedocypris species’ genome assemblies by using MCScanX (). In short, the predicted proteins for each Paedocypris species were merged with predicted proteins from zebrafish into a single file, and BLASTP () was executed with this file as both query and target, thus identifying putative homologs both within each species and between. The genomics positions of the proteins were extracted from the annotation files, and the BLASTP results and the genomic positions were provided to MCScanX for identifying the putative homologous regions. […]

Pipeline specifications