A method based on the random forests algorithm and discrete wavelet transform to timely predict protein-protein-interactions. The pipeline of PPI_RF is composed of three main steps: i) the protein sequences were converted into numerical signals by using the physicochemical properties of amino acids, and then these sequences were further analyzed by Discrete Wavelet Transform (DWT); ii) the salient frequency-band features of DWT were extracted, and a series of statistical features was used to construct the feature vectors for representation of the protein sequence. Finally, the random forests algorithm was applied to deal with the classified problem of protein-protein interaction (PPI) identification using these statistical feature vectors as inputs.
School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China; School of Computer Science, University of Birmingham, Birmingham, UK; Gordon Life Science Institute, Boston, MA, USA
PPI_RF funding source(s)
This work was supported by the National Nature Science Foundation of China (Nos. 61261027, 61262038, 31260273, and 61202313); the Natural Science Foundation of Jiangxi Province, China (Nos. 20122BAB211033, 20122BAB201044, and 20132BAB201053); the Scientific Research Plan of the Department of Education of Jiangxi Province (GJJ14640); and the Young Teacher Development Plan of the Visiting Scholars Program, University of Jiangxi Province.