Protein sequences are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. Sequence-based protein classifiers assign class labels to proteins based on a set of features, real numbers that capture some sequence property. This process entails three distinct steps: (i) feature extraction to map protein sequences to points in a feature space, (ii) a classifier to construct an optimally separate protein classes in this feature space using a set of proteins with known class labels, and (iii) the trained classifier to predict class labels for new proteins. Software tools are available for each of these three steps. Feature extraction is available as software package and through web services and an extensive range of classification software has been developed, some of which include feature visualization.

(Cao and Xiong, 2014) Protein sequence classification with improved extreme learning machine algorithms. Biomed Res Int.

(van der Berg and al., 2014) SPiCE: a web-based tool for sequence-based protein classification and exploration. BMC Bioinformatics.

