Computational protocol: Phylogenetic and structural analysis of the phospholipase A2 gene family in vertebrates

Similar protocols

Protocol publication

[…] We used the EBI web tool, MUSCLE (), to align the sequences of the phospholipase and PLA2 family proteins. Rearranged gene sequences were generated according to the new amino acid alignment. The results of the amino acid alignment were placed in an aligned CDS fasta file using the EMBL web tool, PAL2NAL () (, which can form multiple codon alignments from matching amino acid sequences. The format was converted with the use of MEGA4.0. software (). [...] The full alignment of sequences was used for the phylogenetic analysis. Akaike Information Criterion in PAUP* version 4.0 () was applied to evaluate the most appropriate model of amino acid substitution for early tree-building analyses. ML optimizations and distance methods were valued by the PhyML program in PAUP* version 4.0 (). The most appreciated evolution type, GTR+I+G, was computed for the PLA2 gene family using Modeltest version 3.7 (). Phylogenetic trees were reconstructed using the Bayesian method from the DNA alignment with the use of MrBayes version 3.1.2 software (,) according to the best-fit predictive model. The parameters for tree generation were as follows: 2×106 generations of the PLA2 gene family were included with sampling every 1,000 generations, and with four chains (three cold, one heated); the first 250,000 generations (250 trees) were discarded from every run for the two families (phospholipase and PLA2). Analyses with the NJ, ME and MP methods were performed using MEGA4.0. software (). [...] Selective pressures of HA and NA genes were detected by CODEML in the PAML package version 4.4 (). Three codon-based likelihood methods were run as branch, site and branch-site models. P<0.05 was used to determine whether or not the alternative hypothesis was significant. In these analyses, ML estimates of the selection pressure were based on the ratio dN/dS (ω), where dN and dS are the non-synonymous and synonymous substitution rates, respectively, which vary across codons; the probability of each codon being under positive selection was estimated. Positive selection sites can occur in very short episodes or on only a few sites during the evolution of duplicated genes when ω >1 (). All alignments resulted from the PAL2NAL web tool. The parameter estimates (ω) and likelihood scores were calculated for three pairs of models: M0 (one ratio) vs. M3 (discrete); M1a (nearly neutral) vs. M2a (positive selection); and M7 (β) vs. M8 (β + ω). The likelihood ratio test (LRT) was used to compare the fit to the data of two nested models, assuming that twice the log likelihood difference between the two models (2ΔL) follows a χ2 distribution with a number of degrees of freedom equal to the difference in the number of free parameters (). Naive empirical Bayes and empirical Bayes selection criteria implemented in PAML4 were used to identify sites under positive selection or relaxed purifying selection in the foreground group with significant LRTs. Each branch group was also labeled as a foreground group. The flow of positive selective site detection is presented in . [...] The protein sequence liner and 3D structure of PLA2 based on PLA2G7_Homo were created by the online tool, PredictProtein () (www.predictprotein. org), and I-TASSER (–) ( Functional areas were marked in and . […]

Pipeline specifications

Software tools MUSCLE, PAL2NAL, MEGA, PAUP*, PhyML, ModelTest-NG, MrBayes, PAML, PP, I-TASSER
Applications Phylogenetics, Protein structure analysis, Nucleotide sequence alignment
Organisms Danio rerio, Homo sapiens, Mus musculus, Rattus norvegicus, Sus scrofa, Canis lupus familiaris, Gallus gallus, Bos taurus, Xenopus laevis, Pongo abelii