A sequence-based predictor of disordered flexible linkers (DFLs). DFLpred outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods.
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada; Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
DFLpred funding source(s)
This work was supported in part by the Discovery grant (298328) from the Natural Sciences and Engineering Research Council (NSERC) of Canada and by the Qimonda Endowed Chair and by a scholarship from the China Scholarship Council.