[…] he value was also scaled by a factor of 10 and subtracted from the TM helix SVM score to prevent TM helix prediction. Residues were therefore predicted to lie in one of five different topological regions: inside loop, outside loop, TM helix, re-entrant helix and signal peptide., To evaluate performance, four metrics were used. Firstly, correct location of the amino terminus; secondly, correct number of TM helices; thirdly, correct number and location of TM helices (based on an overlap of at least five residues with the helix boundaries in our data set) and fourthly, correct overall topology. For comparison, we also evaluated a number of other leading topology predictors. For this method and MEMSAT3, the appropriate cross-validated training data was used in assessing performance. Where equivalent data was unavailable for the other methods, performance is likely to be overestimated as it is likely that there is significant overlap between test and training sets. We also assessed performance of the method against proteins containing signal peptides and re-entrant helices., We also trained an additional SVM to discriminate between TM and globular proteins, to be used as a pre-filter prior to TM topology prediction. For SVM training, we used the data set of 131 TM proteins and 416 globular proteins from non-redundant PDB chains as used by MEMSAT3. To accurately compare with MEMSAT3 we used exactly the same test set consisting of 184 TM proteins from the Möller data set and a separate set of 2269 non-redundant globular protein chains, giving a total of 2453 test cases. PSI-BLAST profiles were generated for all sequences and 10-fold cross validation was used to assess performance, again removing sequences from the training fold with greater than 25% sequences identity to any sequence in the test fold., For whole genome analysis, ten genomes – nine eukaryotic and one prokaryotic – were downloaded from the Ensembl [] and NCBI [] websites. Protein sequences were extracted and PSI-BLAST profiles were generated using the SWISS-PROT database. The TM/globular predictor was used to identify TM proteins, which were then subject to full topology prediction., MEMSAT-SVM is available as downloadable source code and as a web server from the URL below and is free for non-commercial use. All data sets are also available, and cross-validation SVM model files are available on request. The software has been tested on a Linux operating system. In order to compile and run, the gcc compiler, Perl interpreter, and NCBI tools are required., , The authors declare that they have no competing interests., Both authors have contributed equally to this work. Original source code was developed by DTJ. This was re-written and extended by TN. DTJ provided direction for computational aspects of the algorithm and biological/biophysical insight into aspects of membran […]

