A predictor for identifying DNA-binding proteins only based on the sequence information of proteins. iDNAPro-PseAAC extends the classic PseAAC approach by incorporating the evolutionary information in the form of profile-based protein representation. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved.
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
iDNAPro-PseAAC funding source(s)
This work was supported by the National Natural Science Foundation of China (No. 61300112 and 61272383), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, the Natural Science Foundation of Guangdong Province (2014A030313695), Strategic Emerging Industry Development Special Funds of Shenzhen (JCYJ20140508161040764), and National High Technology Research and Development Program of China (863 Program) [2015AA015405].