A web server for identifying DNA-binding proteins by applying ensemble learning. enDNA-Prot firstly encoded each protein sequence into a feature vector with dimension of 188 with features only extracted from protein sequence and then fed into an ensemble classifier constructed with 20 different machine learning classifiers. The experimental results showed that the proposed method outperforms most existing state-of-the-art methods, indicating that enDNA-Prot is an effective method for DNA-binding protein identification for both balanced dataset and unbalanced dataset. Furthermore, it also showed that the performance of enDNA-Prot trained with expanded benchmark dataset is better than the one trained with benchmark dataset, which indicates that expanding training dataset with negative samples can improve its predicative performance.
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China; Gordon Life Science Institute, Belmont, MA, USA; PKU-HKUST ShenZhen-Hong Kong Institution, Shenzhen, Guangdong, China; Peking University Shenzhen Graduate School, Shenzhen, Guangdong, China; School of Engineering & Applied Science, Aston University, Birmingham, UK; School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
enDNA-Prot funding source(s)
This work was supported by the National Natural Science Foundation of China (no. 61300112, 61370165), the Natural Science Foundation of Guangdong Province (no. S2012040007390, S2013010014475), the Scientific Research Innovation Foundation in Harbin Institute of Technology (HIT.NSRIF.2013103), the Shanghai Key Laboratory of Intelligent Information Processing, China (no. IIPL-2012-002), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, MOE Specialized Research Fund for the Doctoral Program of Higher Education 20122302120070, Open Projects Program of National Laboratory of Pattern Recognition, Shenzhen Foundational Research Funding JCYJ20120613152557576, Shenzhen International Co-Operation Research Funding GJHZ20120613110641217, Strategic Emerging Industry Development Special Funds of Shenzhen (ZDSY20120613125401420 and JCYJ20120613151940045), and the Key Basic Research Foundation of Shenzhen (JC201005260118A).