Computational protocol: Predicting MicroRNA Biomarkers for Cancer Using Phylogenetic Tree and Microarray Analysis

Similar protocols

Protocol publication

[…] A phylogenetic tree is a widely-used tool for investigating evolutionary relationship between DNA sequences. In this study, we adopt the phylogenetic tree approach to cluster miRNAs in terms of their phylogenetic structure. The phylogenetic analysis cannot be used to analyze microarray data alone, because it can only be adopted to analyze sequence data. However, it can be used as an ancillary tool to find high-confidence miRNA biomarkers by combining it with a microarray analysis. To access miRNA sequences, miRBase is a useful database that provides most discovered miRNA sequences for many species [,]. In this study, the miRNA stem-loop sequences are accessed from miRBase and are used to build a phylogenetic tree. For example, the accession number of Homo sapiens miR-211 (has-mir-211) is MI0000287, and its stem-loop sequence is UCACCUGGCCAUGUGACUUGUGGGCUUCCCUUUGUCAUCCUUCGCCUAGGGCUCUGAGCAGGGCAGGGACAGCAAAGGGGUGCUCAGUUGUCACUUCCCACAGCACGGAG.The stem-loop sequence of a precursor miRNA includes of a 5p mature miRNA sequence and a 3p mature miRNA sequence. It can provide more information of a miRNA than only using a mature miRNA sequence.To plot the phylogenetic tree of these 90 miRNAs, we need to adopt a substitution model to calculate pairwise distances between sequences and adopt a hierarchical clustering distance method to build a tree. In this study, we first calculate the pairwise distances between sequences using the Jukes–Cantor substitution model method. After that, we use the nearest distance method (single linkage method) to plot the phylogenetic tree. The MATLAB codes for plotting the phylogenetic tree are: distances = seqpdist (seqs, ‘Alphabet’, ‘NT’) phylotree = seqlinkage (distances, ‘single’, seqs) where seqs is the 90 miRNA stem-loop sequences. The phylogenetic tree of miRNAs subject to their stem-loop sequences is plotted in .The miRNAs in the same clade are considered to have a similar phylogenetic structure. For example, in , miR-30a, miR-30b and miR-30e are in a clade, and miR-30a and miR-30d are in a nested clade. [...] In order to select high-confidence miRNA biomarkers for cancers, we propose a method combining the results of cancer target prediction and the phylogenetic tree. The first step of the method is to find miRNAs that are classified in the same sub-branch in . For example, the 5 miRNAs, miR-30a, miR-30b, miR-30c, miR-30d and miR-30e, are in a clade. From , these 5 miRNAs are also classified into the same group according to cancer target prediction. From both results, we have more confidence to believe that these 5 miRNAs are involved in pathological mechanism of some particular cancers. Combining the above two methods, we develop a method to classify miRNAs subject to cancer development. The steps of performing this method are given as follows.Procedure of the proposed method: Step 1.Use the stem-loop sequences of miRNAs to build a phylogenetic tree.Step 2.Use RRSM or other microarray data analysis to select cancer-related miRNA. Cluster miRNAs into different groups subject to the cancer target prediction result; see .Step 3.Collect miRNAs in the same clade in the phylogenetic tree of Step 1. If miRNAs in the same clade belong to the same group of miRNAs that are clustered in Step 2, these miRNAs are selected to be high-confidence miRNA biomarkers for particular cancers.Use the stem-loop sequences of miRNAs to build a phylogenetic tree.Use RRSM or other microarray data analysis to select cancer-related miRNA. Cluster miRNAs into different groups subject to the cancer target prediction result; see .Collect miRNAs in the same clade in the phylogenetic tree of Step 1. If miRNAs in the same clade belong to the same group of miRNAs that are clustered in Step 2, these miRNAs are selected to be high-confidence miRNA biomarkers for particular cancers.Although the steps in this procedure are illustrated using the cancer-related miRNAs, this approach can be generally used to find miRNA biomarkers for other diseases. A flowchart of the method is provided in . […]

Pipeline specifications

Software tools SEQLinkage, Find-miRNA
Databases nextstrain
Application Phylogenetics
Diseases Breast Neoplasms, Neoplasms