Computational protocol: Evidence of positive selection associated with placental loss in tiger sharks

Similar protocols

Protocol publication

[…] The transcriptomes of all nine species were clustered separately using the Uclust function of Usearch (version 7.0.1090) with a similarity threshold of 98 % to breakdown and remove putative splice variants []. The Trinity ORF predictor was employed using TransDecoder (version 2013-05-08) to calculate the longest and most probable translated region for each sequence and to remove multiple ORF sequences. A custom Perl script was used to remove all stop codons from the end of ORF sequences.Pairwise reciprocal Blast searches were employed using Blastn (version 2.2.29) to identify putative orthologous sequences shared between each pair of species []. Pairwise putative orthologues were then collated using R (version 3.1.1) in order to find sequences shared between all nine species. The multiple sequence alignment program M-Coffee (version 11.00.9103146) was used to align orthologous sequences between all species using Mafft, Muscle, T-Coffee, and Kalign methods []. Gblocks was also used to remove poorly aligned and divergent regions to further reduce alignment errors and gaps []. The Gblocks parameters used were: minimum of seven sequences for a conserved position, minimum of seven sequences for a flank position, maximum of six contiguous non-conserved positions, minimum block length of nine, and 50 % or more of sequences with a gap were treated as a gap position. M-Coffee translated nucleotide sequences into amino acid sequences, which were aligned for all species and then back-translated to nucleotide sequences. Alignments were then trimmed and graded. Only alignments with quality grades of nine, the highest score, were retained for further analysis. [...] The aligned orthologues were concatenated and used to construct a phylogenetic tree using RAxML (version 8.0.0) []. We utilised the nucleotide substitution model GTRGAMMA, as determined by jModelTest (version 2.1.4) using the lowest value of the Akaike Information Criterion [, ]. Bootstrap values were calculated using 1,000 replicates. CodeML calculated the number of substitutions which alter the amino acid sequence (nonsynonymous (dN )) and the number of substitutions which do not alter the amino acid sequence (synonymous (dS)) []. The ratio of these substitutions can be used to detect genes exhibiting signatures of positive selection. Positive selection, the favouring of distinct phenotypes, can be indicated by a dN / dS ratio (ω) > 1. ω < 1 is indicative of negative selection, the removal of deleterious alleles, and ω = 1 can indicate neutral selection, drift of alleles not affecting an individual’s ability to pass on their genes []. It is suggested positive selection can only be detected if the average ω across all codon sites, calculated by the one ratio model (M0), is greater than 1. This is conservative, however, considering most codon sites will be highly conserved to maintain protein structure and function []. Also, estimates of ω can be artificially decreased due to partial sequences produced by high-throughput sequencing technologies [].A maximum-likelihood site test based on a comparison between the neutral model 7 (M7) and the non-neutral model 8 (M8) was therefore employed. M7 assumes a β distribution of ω between 0 and 1, not allowing ω > 1 at any sites []. M8 also assumes a β distribution of ω but allows an additional category of sites were ω > 1 []. A CodeML likelihood ratio test (LRT) was used to test if M8 fits the data significantly better than M7. The natural log likelihood (lnL) values of M7 were contrasted with those of M8. The lnL ratios were then compared to a chi-squared distribution with two degrees of freedom. False positive results are possible when implementing positive selection analyses on a genomic scale, therefore the Benjamini-Hochberg false discovery rate (FDR) correction was applied to all p values [, ]. The BEB method was employed for sequences with significant M8:M7 likelihood ratios and used to identify codon sites exhibiting signatures of positive selection []. Codon sites were considered to be showing evidence of positive selection if the probability of ω > 1 was more than 95 %.We concatenated the 1,102 orthologues not showing signatures of positive selection for each species to construct a phylogeny in RAxML []. We used the nucleotide substitution model GTRGAMMA, as determined by jModelTest []. Bootstrap values for this phylogeny were calculated using 1,000 replicates. Fossil data were used to produce a time-calibrated phylogenetic tree (Table ). This was accomplished using four independent runs of the MCMCTree function of PAML with 50,000 iterations, a burn in of 10,000 iterations, a sample frequency of three, an independent molecular clock, and the nucleotide substitution model HKY85, determined using jModelTest [].The branch-site test was utilised to detect signatures of positive selection at specific codon sites in the tiger shark lineage. The branch-site test is considered more powerful than the site test as signals of positive selection are not averaged over all branches of the phylogeny []. For the branch-site test, an alternative hypothesis was contrasted to the null hypothesis using an LRT where the lnL values were compared to a chi-squared distribution with one degree of freedom. Estimates of ω were not determined using M0 as this model has been shown to be unreliable when detecting positive selection in specific branches [, ]. The branch-site test has also been known to experience convergence problems when calculating likelihoods, leading to artificial lnL ratios []; thus, three independent runs of this model were performed for both the alternative and null hypotheses, with the highest lnL values kept to calculate the lnL ratios. The Benjamini-Hochberg FDR correction was applied to all p values []. The BEB method was employed to identify codon sites exhibiting signatures of positive selection []. Codon sites were considered to be showing evidence of positive selection if the probability of ω > 1 was more than 95 %. […]

Pipeline specifications