Computational protocol: The Small Noncoding RNA Processing Machinery of Two Living Fossil Species, Lungfish and Coelacanth, Gives New Insights into the Evolution of the Argonaute Protein Family

Similar protocols

Protocol publication

[…] AGO, PIWI, DGCR8, Dicer, Drosha, PLD6, SETDB1, and Mael transcripts were obtained from the L. menadoensis (; ) and P. annectens transcriptomes () (, online; Accession numbers from LT674425 to LT674451). The raw sequence reads of the obtained transcriptomes were deposited in the NCBI BioProject and SRA databases under the accessions PRJNA175365 and PRJNA282925, respectively. Given the high sequence identity within the genus Latimeria (; ), the genome of the congeneric species L. chalumnae was used to obtain synteny information of members belonging to the AGO family in coelacanths (; ). The syntenic positions of the corresponding genes from the other vertebrates were collected from ENSEMBL ( (, online) and checked through Genomicus ( correct orthology of transcripts obtained in both species was assessed by homology using NCBI BLAST ( while for Dicer and SETDB1, given the presence of several paralogous genes, the orthology was assessed by phylogenetic analyses (, online). PAZ, MID, and PIWI domains of members of the AGO family were inferred through InterPro ( phylogenetic analyses of the AGO family was performed on amino acidic sequences using MrBayes (version 3.1; ). Substitution models, posterior probabilities, stationarity, generations, sampling, burnin, specific tree building parameters, and rooting details are reported in the tree legend. Moreover, the Maximum likelihood was performed using MEGA7 () with Jones–Taylor–Thornton () model and the bootstrap support for the ML tree was determined using 1000 replications. The topology of ML tree (data not shown) is similar to that obtained with Bayesian analysis. The Ago and Piwi orthologous sequences were collected from ENSEMBL or NCBI databases. Callorhinchus milii sequences were obtained from (). Little skate, Leucoraja erinacea, Ago subfamily was inferred from the transcriptome at Skatebase (, ), Argonaute RISC catalytic component 1 or EIF2C1 (AGO1): contig 19580, contig 18487, contig 349, contig 28154; Argonaute RISC catalytic component 2 or EIF2C2 (AGO2): contig 89915; Argonaute RISC catalytic component 3 or EIF2C3 (AGO3): contig 22246, contig 15246, contig 11106; Argonaute RISC catalytic component 4 or EIF2C4 (AGO4): contig 90274, contig 89949. Accession numbers of the sequences used in the phylogenetic analysis are reported in , online. The accession number for the WAGO sequence of Caenorhabditis elegans used a outgroup is Q21770. Clustal OMEGA was used to build the alignments (; ). The assignment of the Ago and Piwi genes to ohnolog families was checked using the ohnolog database ( by and the data provided in ) and .The expression values in L. menadoensis male liver, testis, and muscle, in P. annectens brain, liver, and gonads of male and female specimens, and in Danio rerio brain, liver, muscle, and gonads of female specimen and gonads of a male specimen (BioProject PRJNA255848) are reported as transcripts per million (TPM). Expression levels were calculated following the procedure described in to allow inter-species comparison.Omega (dN/dS) rates were calculated with CODEML, included in the PAML 4.8 package (), starting from the codon-based alignment of the coding nucleotide sequences of the target genes obtained with MUSCLE (). Coding sequences were retrieved from ENSEMBL for Mus musculus, Loxodonta africana, Monodelphis domestica, Pelodiscus sinensis, Gallus gallus, Xenopus tropicalis, D. rerio, Lepisosteus oculatus, and Tetraodon nigroviridis, from () for C. milii and from SkateBase (, ) for L. erinacea and Scyliorhinus canicula. The accession IDs of the sequences used for each of the 15 genes selected (AGO1, AGO2, AGO3a, AGO3b, Piwi-like RNA-mediated gene silencing 1 (PIWIL1), Piwi-like RNA-mediated gene silencing 2 (PIWIL2), Piwi-like RNA-mediated gene silencing 4 (PIWIL4), Dicer, Drosha, PLD6, Mael, SETDB1, and DGCR8) are reported in , online. Missing data (gaps) were not considered and incomplete sequences (corresponding to less than 75% of the expected length) were discarded. Only informative codons were retained with Gblocks () and the resulting alignments were converted in a Phylip format. We used the topology of species tree from to test the null (one-ratio model) and the alternative (multiple-ratio model) model hypotheses for each gene. The alternative hypothesis assumed different omega rates for the tetrapod, Actinopterygii, lungfish and coelacanth lineages. A likelihood ratio test was used to determine the significance of the data obtained, by comparing 2ΔlogL with a χ2 distribution. The two models were considered as producing statistically significant likelihoods for P-values lower than 0.05. β-actin (ACTB) was used as control gene.The same data set was subjected to a Tajima’s Relative Rate Test (RRT) analysis (), using the sequences of the three chondrichthyan species (whenever available) as outgroups. Differences in the rate of evolution of lungfish sequences (ingroup I) compared to other vertebrate species (ingroup II) were considered as significant at Pe < 0.05. In parallel, a Maximum Likelihood Molecular Clock analysis was performed with MEGA 7 () to test the null hypothesis of an equal evolutionary rate throughout the tree. This analysis was based on a NJ tree topology, under the general time reversible model of evolution with a discrete Gamma distribution of rates across sites (, online). […]

Pipeline specifications