Computational protocol: Evolutionary history of the iroquois/Irx genes in metazoans

Similar protocols

Protocol publication

[…] Irx and Mkx gene sequences were retrieved using TBLASTN and BLASTP algorithms [] on the current assembly and the predicted proteins (if available) of the genomes of the species indicated in Figure , using the BLAST servers dedicated to these species (Doe Joint Genome Institute, Baylor College of Medicine, Flybase, Genome Sequencing Center, and Ensembl) or the National Center for Biotechnology Information (NCBI) BLAST server (Genomic BLAST databases) [-]. Additional BLAST searches were also performed against the NCBI protein and EST databases in order to identify Irx and Mkx genes in additional species whose genomes are not completely sequenced. Aminoacid sequences were subsequently predicted using Geneid, Genscan, and TBLASTN against the NCBI nr protein database [,,]. All the sequences we have isolated are available upon request. Species abbreviations used in the present article are: Acypis = Acyrthosiphon pisum (pea aphid – insect); Aedaeg = Aedes aegypti (yellow fever mosquito – insect); Ampque = Amphimedon queenslandica (demosponge); Anogam = Anopheles gambiae (mosquito – insect); Apimel = Apis mellifera (honey bee – insect); Bommor = Bombyx mori (silkworm – insect); Braflo = Branchiostoma floridae (amphioxus – cephalochordate); Brarer = Brachydanio rerio (zebrafish – vertebrate); Caeele = Caenorhabditis elegans (nematode); Calvic = Calliphora vicina (Blue blowfly – insect); Capsp1 = Capitella sp I (annelid); Culpipqui = Culex pipiens quinquefasciatus (mosquito – insect); Dappul = Daphnia pulex (water flea – crustacean); Dromel = Drosophila melanogaster (fruitfly – insect); Galgal = Gallus gallus (chick – vertebrate); Helera = Heliconius erato (Red Passion Flower butterfly – insect); Homsap = Homo sapiens (vertebrate); Hydmag = Hydra magnipapillata (cnidarian); Lotgig = Lottia gigantea (limpet – mollusk); Musmus = Mus musculus (mouse – vertebrate); Mytcal = Mytilus californianus (mussel – mollusk); Nasvit = Nasonia vitripennis (parasitoid wasp – insect); Nemvec = Nematostella vectensis (sea anemone – cnidarian); Pedhumcor = Pediculus humanus corporis (human body lice – insect); Petmar = Petromyzon marinus (Sea lamprey – vertebrate); Sackow = Saccoglossus kowalevskii (hemichordate); Schmed = Schmidtea mediterranea (Planarian – platyhelminthes); Spofru = Spodoptera frugiperda (fall armyworm – insect); Strpur = Strongylocentrotus purpuratus (purple sea urchin – echinoderm); Subdom = Suberites domuncula (demosponge); Tetnig = Tetraodon nigroviridis (pufferfish – vertebrate); Triadh = Trichoplax adhaerens; Tricas = Tribolium castaneum (red flour beetle – insect); Xentro = Xenopus tropicalis (vertebrate). [...] Multiple alignments were performed with Clustal W [] using the ClustalW web server at the Bioinformatics Center of the Kyoto University [] and they were subsequently manually improved. Handling of the multiple alignments was done using SEAVIEW []. Unweighted maximum-parsimony (MP) and neighbour-joining (NJ) reconstructions were performed with the PAUP 4.0 program []. NJ analyses were done using the BioNJ algorithm [] and 10,000 bootstrap replicates. MP analyses were performed with the following settings: heuristic search of over 250 bootstrap replicates; MAXTREES set at 3000, and other parameters set at default values. Maximum likelihood (ML) analyses were performed with PHYML []. PHYML analyses were performed using the WAG amino-acid substitution model [], the frequencies of amino acids being estimated from the data set, and rate heterogeneity across sites being modelled by two rate categories (one constant and eight g-rates). The amino acid substitution model was chosen using ModelGenerator []. Statistical support for the different internal branches was assessed by bootstrap resampling (500 bootstrap replicates), as implemented in PHYML []. Bayesian inference was performed using the Markov chain Monte Carlo method as implemented in the MRBAYES (version 3) package [,]. We used the WAG substitution frequency matrix [] with among-sites rate variation modelled by means of a discrete g distribution with four equally probable categories. Two independent Markov chains were run, each containing from 1,500,000 to 3,000,000 Monte Carlo steps (depending on the number of steps required to get chain convergence). One out of every 250 trees was saved. The trees obtained in the two runs were meshed and the first 25% of the trees were discarded as 'burnin'. Marginal probabilities at each internal branch were taken as a measure of statistical support. All the alignments and the trees are available upon request. Phylogenetic relationships between the species used in this study (as depicted in Figure ) are based on [-]. […]

Pipeline specifications

Software tools TBLASTN, BLASTP, BLASTN, GeneID, GENSCAN, Clustal W, SeaView, PAUP*, BIONJ, PhyML, ModelGenerator, MrBayes
Databases FlyBase NCBI Protein
Applications Phylogenetics, Amino acid sequence alignment
Organisms Drosophila melanogaster, Petromyzon marinus