Computational protocol: A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction

Similar protocols

Protocol publication

[…] Calculations were performed using the RNAstructure package (). Specifically, partition function (program partition) (), stochastic sampling (program stochastic), ProbKnot (), secondary structure comparison (program scorer) and the folding free energy calculator (program efn2) were used. [...] The sensitivity analysis was performed by perturbing each independent parameter with perturbations ranging from −3 σ to +3 σ, in increments of one σ, where σ is either the standard error for the parameter or a flat value of 0.5 kcal/mol. Using standard error reveals those parameters that have a large impact on structure prediction relative to how well defined that parameter is, suggesting parameter classes that can be the focus of future experiments. Using a flat value allows a comparison of the impact of different parameters, identifying those parameters whose precise values are the most important to determine for non-standard nucleotides.The standard error for a parameter is the estimate of the magnitude of the error for the mean of the parameter, and the standard error scales with the reciprocal of the square root of the number of measurements (). The standard error is the proper estimate of the error for a parameter because the major source of error is random experimental errors; therefore taking multiple measurements reduces the error in the parameter estimate. Standard deviation, in contrast, is an estimate of the width of the distribution of a parameter and is a reflection of the magnitude of the random errors. As such, standard error is used throughout the sensitivity analysis.Using the perturbed parameter sets, new data tables for RNAstructure were generated following the rules outlined in the NNDB (). This ensured that symmetric parameters for base pairs and internal loops always had equivalent values. Additionally, the precalculated approximations, such as those for unmeasured 1 × 1, 2 × 1 and 2 × 2 internal loop parameters are updated to reflect the perturbed parameter values. The perturbed data tables were then used to calculate the pair probability of each possible base pair of each sequence in the archive using the programs partition and ProbabilityPlot. The program ProbabilityPlot outputs the probability of all possible base pairs, which are those base pairs that can form an allowed pair (A-U, G-C, G-U) and can form a run of two or more base pairs.RMSDs of the pair probabilities were calculated for each sequence, comparing pair probabilities calculated from each of the perturbed data tables to the probabilities calculated with unperturbed data tables (the reference parameter set): where NBP is the number of possible base pairs, PN is the base pair probability calculated with the perturbed data tables and PR is the base pair probability calculated with the reference data tables. NBP is the sum, for each sequence, of the total number of possible canonical (AU, CG and GU) pairs for that sequence, where pairs are also required to be able to form a helix with at least two stacked base pairs.Structures were predicted from the pair probabilities (both perturbed and reference parameter sets) using ProbKnot (). ProbKnot is a method to predict maximum expected accuracy structures (). It assembles structures with base pairs of nucleotides that are mutually maximal base pairing partners. Thus, i is paired with j if and only if the nucleotide with highest pairing probability with i is j and the nucleotide with the highest pairing probability for j is i.To quantify the difference in predicted structures between a perturbed data set and the reference data set, a sensitivity defect and a positive predictive value (PPV) defect were calculated for the secondary structures predicted using perturbed parameter tables as compared to secondary structures predicted using the reference-parameter tables. Sensitivity defect and PPV defect were defined as a measure of the difference in the two predicted structures: where NBP with both tables is the number of pairs that appear in both predicted structures, NBP with perturbed tables is the number of pairs in the structure predicted with the perturbed tables and NBP with reference tables is the number of base pairs predicted with the standard nearest neighbor rules. A sensitivity defect of 0 indicates that all pairs predicted by the reference parameters are also predicted by the perturbed parameters. A PPV defect of 0 indicates that all the pairs predicted by perturbed parameters are also predicted by the reference parameters. Base pairs were considered identical even if one of the nucleotides in the pair was shifted by up to one nucleotide in either direction. Therefore, pair i-j for one set of parameters would be considered the same pair as i–j, (i + 1) − j, (i − 1) − j, i − (j + 1) or i − (j − 1). This is because thermal energies are sufficient for pairs to fluctuate in this manner (,). […]

Pipeline specifications

Software tools RNAstructure, ProbKnot
Application RNA structure analysis