Computational protocol: Faster computation of exact RNA shape probabilities

Similar protocols

Protocol publication

[…] A TDM folds a sequence only to a restricted set of structures. For RapidShapes, such a restricted set of structures comprises all structures of a particular shape (and no other). A TDM can compute the partition function value Qp(s) of its restricted folding space Fp(s) in ????(n3) time, just as the unrestricted RNA folding program does for the complete folding space F(s). Since Qp(s) is the sum of all structures in Fp(s), and Fp(s) is a precise subset of F(s), we have to ensure that a TDM folds exactly those structures constituting the shape class p. Our strategy is to generate such programs on demand. [...] RNA structures are conveniently described by context-free grammars (Durbin et al., ), where the parses of an RNA sequence s indicate all its possible foldings F(s). Parses implicitly assign a score with each structure, be it probabilities from a stochastic model, base pair counts or thermodynamic energies. Here, we use tree grammars to describe structures. Tree grammars make explicit the semantics of each grammar rule, and can be compiled directly into executable code using the ADP technology (Giegerich et al., ).An expository simplification of the tree grammar that we use to compute F(s) is given in . The left part of shows one of the candidate structures, tree t1, which this grammar assigns to the input sequence ACCAAAGG. The operators root, last, stack and hairpin will be used to compute all relevant properties of this structure candidate. The actual grammar used in RNAshapes as well as in RapidShapes uses 26 rules and 35 operators to accomodate the thermodynamic energy model, including dangling bases. Fig. 1. Fig. 2. We can imagine the progress of an RNA folding program based on tree grammars in two phases. Phase one is the generation of all candidate structures (trees), while in the second phase a score is assigned to each candidate. The score of a candidate depends on the scheme that describes how the tree operators must be evaluated in phase two. For example, if we want to assign a Vienna Dot-Bracket representation as a score to t1, we can apply ????dotBracket to get the string .((…)) (). By exchanging ????dotBracket with ????shape5, we get [] for t1. Other schemes are used to calculate thermodynamic energies, probabilities or base pair counts. […]

Pipeline specifications

Software tools RapidShapes, RNAshapes
Application RNA structure analysis