Computational protocol: Detecting riboSNitches with RNA folding algorithms: a genome-wide benchmark

Similar protocols

Protocol publication

[…] Structure prediction programs were tested on the sequences containing each allele for every riboSNitch and non-riboSNitch. Non-riboSNitch sets were matched in size to each riboSNitch set to reduce computational costs. As a strategy for matching non-riboSNitches and riboSNitches in terms of their experimental validation, a non-riboSNitch set was matched to a riboSNitch set of size n by selecting the top n non-riboSNitches according to their false discovery rate (FDR)-adjusted P-values from PARS comparisons. Since each RNA has P-values on three potential comparisons—mother versus father, mother versus child and father versus child—the P-value used here is the average of the comparisons.The Unix commands used for each algorithm are listed in Supplementary Table S4. The ‘specialized’ algorithms directly score the distance between sequence pairs. SNPfold () scores with a Pearson correlation coefficient, RNAsnp () returns a P-value on Euclidean distance, remuRNA () measures the relative entropy between two RNAs and RNAmute () measures the edit distance between MFE structures. For algorithms that do not intrinsically compare the structures of sequences between two RNA variants, predictions on dot bracket structures or BPPMs were compared for each allele. All the general algorithms except CONTRAfold () and CentroidFold () return an MFE structure as their dot bracket structure, so for simplicity dot bracket structures are referred to as MFE structures in this benchmark. CentroidFold, CONTRAfold, RNAfold (), RNAstruture () and UNAFold () are capable of returning both MFE structures and BPPMs. MC-Fold () and RNAmutants () return only MFE structures. MFE structures were compared with the RNAdistance function from ViennaRNA 2.1.1 and BPPMs were compared with RNApdist. The RNApdist function used here is a modified version of the RNApdist function implemented by ViennaRNA (,). Essentially, base pairing probability differences are summed without performing an alignment of the BPPMs. The distance between base pair probability matrices of sequences 1 and 2 is given by where n is the number of bases and is the probability of base i being paired with base j. and are the probabilities of base i being upstream paired, downstream paired and unpaired, respectively, for BPPMs 1 and 2. Note that this modification on RNApdist assumes that the sequences being compared have the same length. The ViennaRNA implementation of the RNApdist function was used for benchmarking RNAfold (which is the main folding algorithm in the ViennaRNA package). […]

Pipeline specifications

Software tools SNPfold, remuRNA, RNAmute, CONTRAfold, RNAfold, UNAFold, MC-Fold, RNAmutants, ViennaRNA
Application RNA structure analysis
Organisms Homo sapiens
Chemicals RNA