Computational protocol: Secondary structure in the target as a confounding factor in synthetic oligomer microarray design

Similar protocols

Protocol publication

[…] Based on the results of this in silico experiment, secondary structure prediction in the target is being used to develop a new criterion for oligonucleotide probe design. Our results from this modeling experiment demonstrate that the implicit assumption used until now – that eliminating probe secondary structure by avoiding self-complementarity eliminates target secondary structure as well – is valid only when the target and probe are of the same length. Use of target secondary structure as an explicit criterion will allow for masking or preferentially avoiding the regions of the target sequence in which base pairs are directly involved in secondary structure formation, to eliminate these regions from the sequence for the purpose of the search for the optimal probe.In this study we have assigned accessibility scores to sites in the target sequence based only on the fraction of predicted structures within 5% of the energy optimum, in which a residue is found in a single-stranded conformation. While this measure is not too computationally intensive to compute, and can be applied to genome-scale problems using readily available software (Mfold), it is not the most physically rigorous definition of accessibility. By equally weighting each possible structure in the ensemble of optimal and suboptimal structures that a molecule can form, it is possible that secondary structure at some positions in the molecule is overcounted; bonds which form only in rare conformations are considered equal to bonds which are present in the lowest-energy structure. The program Sfold [-] assigns accessibility based on an ensemble-weighted average of secondary structure. The program RNAfold[], part of the Vienna RNA package, implements McCaskill's partition function approach[] to arrive at pairing probabilities for each pair of bases in the sequence, from which a summary per-base accessibility can be derived. These methods are more rigorous than MFold and we expected they might produce somewhat different results, although it has also been shown that predicted binding states from MFold optimal structures perform almost as well as SFold and RNAFold predictions when applied to molecules of known 3D structure [].When we compared MFold-based accessibility predictions for an individual transcript to those generated by SFold and RNAFold, we found that the difference in average predicted accessibility over an entire transcript is small. We computed accessibility for the transcript of human 1CAM-1, which has been mapped experimentally to determine its accessibility []. The average fractional accessibility derived from MFold results is about 3–4% greater than that predicted by RNAFold or SFold. Therefore use of this fractional accessibility measure will not impose an unnecessary constraint on the design process relative to other predictive approaches. The accessibility profiles calculated for ICAM-1 using each method are shown in Fig. . In each section of the figure, antipeak locations (having lower pairing probability and therefore likely to be more accessible) can be compared to the extendable sites detected by Allawi et al [], which are indicated by green dots at the bottom of the plot. In each prediction, there are a number of apparently correct predictions and obvious errors, and it is not clear which method is yielding the best results at the residue level. A systematic, competitive test of these predictions against solution accessibility data gathered on various experimental platforms is called for, although available data sets for validation are still rare. In the absence of such validation, the MFold accessibility predictions are sufficient to predict the scope of the secondary structure problem in a genome-based array design, even if some details of the prediction are not correct. An experimental approach will eventually be required to determine which approach best represents the conditions of the microarray experiment. […]

Pipeline specifications

Software tools Mfold, Sfold, RNAfold
Application RNA structure analysis
Organisms Brucella suis 1330
Chemicals Nucleotides