Computational protocol: The impact of RNA structure on coding sequence evolution in both bacteria and eukaryotes

Similar protocols

Protocol publication

[…] We obtained genomic sequences from the following sources: the Comprehensive Microbial Resource (http://cmr.tigr.org/) for E. coli, the Saccharomyces Genome Database (ftp://genome-ftp.stanford.edu/) for S. cerevisiae, the Eisen Lab (http://rana.lbl.gov/drosophila/) for D. melanogaster, and Ensembl (http://www.ensembl.org/) for M. musculus.For E. coli, we obtained orthologs in Shigella sonnei, Shigella flexneri, Shigella boydii, Shigella dysenteriae, Klebsiella pneumoniae, Salmonella typhimurium, Salmonella enterica, Photorhabdus luminescens, and Sodalis glossinidius from TIGR’s Comprehensive Microbial Resource (http://cmr.tigr.org/). For S. cerevisiae, we obtained orthologs in Saccharomyces paradoxus, Saccharomyces mikatae, Saccharomyces bayanus, Saccharomyces kudriavzevii, Saccharomyces castellii, and Saccharomyces kluyveri from the Saccharomyces Genome Database (ftp://genome-ftp.stanford.edu/). For D. melanogaster, we obtained orthologs in Drosophila simulans, Drosophila sechellia, Drosophila yakuba, Drosophila erecta, Drosophila ananassae, Drosophila pseudoobscura, Drosophila persimilis, Drosophila willistoni, Drosophila mojavensis, Drosophila virilis, and Drosophila grimshawi from the Drosophila 12-genome project AAAWiki at http://rana.lbl.gov/drosophila/. For mouse, we obtained orthologs in human, chimp, macaque, rat, cow, dog, and horse from Biomart through the Ensembl Homology track (http://www.ensembl.org/). We built multiple alignments of orthologous sequences based on the peptide sequences with MUSCLE []. We excluded from our data set those ortholog pairs for which less than 80% of either sequence could be aligned to the other sequence. We only saved the alignments in which each species has its corresponding ortholog. This step yielded 1,156, 1,164, 3,047, and 6,324 alignments in E. coli, yeast, fly, and mouse, respectively. [...] For each species group, the evolutionary phylogenetic tree was inferred by RAxML [] using concatenated amino acid sequence (Additional file : Figure S6). Based on the topology and branch lengths of the tree, weights were be calculated by Branch Manager [] for each species in the alignment that control for phylogenetic relationship among the orthologous sequences. Then the conservation level of a particular nucleotide site in the alignment can be expressed as weighted entropy (Ew): E w = − ∑ i ∈ N p i log 2 p i Here, N is the set of unique nucleotides in the column and pi is the weighted fraction of sequences carrying a particular nucleotide i. Lower Ew means higher conservation level and Ew = 0 means the no variation in nucleotide type in the column. We considered sites with Ew < 0.5 as conserved.We also applied phyloP program [] to compute conservation score for each nucleotide site. The conservation P-values were computed using the likelihood ratio test (LRT) method with “--wig-scores” option. The phylogenetic model was produced by the phyloFit program [] using “REV” nucleotide substitution model. The site specific conservation score was computed as “-log(P)”. Higher phyloP score means higher conservation level. […]

Pipeline specifications

Software tools MUSCLE, RAxML, PHAST
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Saccharomyces cerevisiae, Mus musculus, Escherichia coli, Drosophila melanogaster
Chemicals Nucleotides