Pipeline publication

[…] ological problem and will be discussed later. The transition rates can be estimated by the standard likelihood maximization (ML) approach. Here we used the Bayesian inference of the posterior probability distribution of model parameters using Metropolis-Hastings MCMC (Markov Chain Monte Carlo) sampling technique, following ., Protein sequences of Gram-negative bacteria were downloaded from GeneBank release 175 (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/). Orthologous protein groups were downloaded from the NCBI Protein Clusters database release Jan 2010 (ftp://ftp.ncbi.nih.gov/genomes/CLUSTERS/). The data set contained 717455 proteins from 593 genomes. Multiple alignments were constructed by Muscle . Protein phylogenetic trees were created using the protdist and neighbor programs in the PHYLIP package . Signal peptide scores were calculated by SingalP 3.0-NN . In the evolutionary analysis, we considered a subset of orthologous clusters where different discrete predictions of signal peptides were present., Orthologous groups and the phylogenetic trees were downloaded from the MicrobesOnline resource . Transcription factor binding scores were downloaded from the RegPrecize database ., To test the ability of tHMM to improve the state reconstruction accuracy and to define the range of tHMM applicability, computer simulations were performed. We compared the efficacy of tHMM itself, the dumbtHMM and the BayesTraits Mulitstate software on a set of different simulated datasets., The simulation parameters were rates of the state changes (transition rates) and the score distributions for states. 16 sets of simulated data (phylogeny and scores at the leaves) were generated (see ), each consisting of 400 trees., Phylogenies were generated by sampling the branch lengths from the distribution that was obtained from the Signal peptides dataset. The speciation process was terminated and a leaf was created if the distance from the root to the current node exceeded 0.5. This constraint restricted the tree sizes to the range of leaves. For each new node, one of the two states was assigned according to […]

Pipeline specifications

Software tools MUSCLE, PHYLIP, BayesTraits