Computational protocol: Evolutionary history of histone demethylase families: distinct evolutionary patterns suggest functional divergence

[…] The amino acid sequences of the AOD domain in reported KDM1 histone demethylases were retrieved from National Center for Biotechnology Information (NCBI). They were used as queries to search against NCBI, TAIR, TIGR and JGI databases for all possible AOD domain-containing proteins in selected eukaryotic organisms by using TBLASTN with e-value less than e-5 as cut-off. All the new results were used as queries to carry out a second round of BLAST search, until no new sequence was found. The collected protein sequences were then analyzed by SMART and Pfam for domain architecture. The proteins which lack the AOD domain or have an AOD domain with e-value greater than e-10 based on both SMART [] and Pfam [] results were excluded from the further analyses. The prokaryotic sequences were retrieved from NCBI database through BLASTP by using eukaryotic AOD domain-containing proteins as queries and e-5 as cut-off. The same procedure was followed for the retrieval of JmjC domain-containing proteins. Common names for the following species are shown in the figures: Arabidopsis, Arabidopsis thaliana; Poplar, Populus trichocarpa; Rice, Oryza sativa; Moss, Physcomitrella patens; Human, Homo sapiens; Cow, Bos taurus; Mouse, Mus musculus; Zebrafish, Danio rerio; Fruitfly, Drosophila melanogaster; Mosquito, Anopheles gambiae; Honey bee, Apis mellifera; Beetle, Tribolium castaneum; Sea squirt, Ciona intestinalis; Sea urchin, Strongylocentrotus purpuratus; and Sea anemone, Nematostella vectensis. [...] A preliminary multiple sequences alignment (MSA) was generated using MUSCLE 3.6 [] with the default settings and a Neighbor-Joining (NJ) tree was constructed using MEGA 4.0 [] based on the MSA. According to the tree topology, the sequences were divided into several subgroups. Each subgroup of sequences was aligned by MUSCLE 3.6 separately followed by manual adjustment using GeneDoc []. These alignments were then combined using the profile alignment function of ClustalX 1.83 []. The codeml program from the PAML 4.1 package is used for the Ka/Ks analyses []. [...] Both NJ and Maximum likelihood (ML) methods were used to perform the phylogenetic analyses. NJ trees were constructed using MEGA 4.0 with "pairwise deletion" option and "Poisson correction" model. Bootstrap test of 1000 replicates was carried out to evaluate the reliability of internal branches. ML trees were generated using PHYML 2.4.4 [] with 100 nonparametric bootstrap replicates. ProtTest 1.4 [] was used to select the model and parameters for the ML analysis. In this study, WAG amino acid substitution model was used and both proportion of invariable sites and gamma distribution parameter were estimated from the data. In this study, we presented only the NJ trees with bootstrap values from both NJ and ML analyses. […]

Pipeline specifications