Computational protocol: The Isoelectric Region of Proteins: A Systematic Analysis

Similar protocols

Protocol publication

[…] Protein sequences were taken from the Lipase Engineering Database and the Medium-Chain Dehydrogenase/Reductase Engineering Database . Sequences with 100% sequence identity and fragments with a length of less than 160 amino acids were excluded, resulting in a total of 4652 sequences from the α/β hydrolase family and 2683 sequences from the dehydrogenase/reductase protein family A set of 5000 random sequences was generated using frequencies for the titratable amino acids from (). The distribution of titratable amino acids was similar to the distribution found in the α/β hydrolase and dehydrogenase/reductase protein families (, ). The random set had a defined protein size range between 250–450 amino acids, similar to the size distribution of the dehydrogenase/reductase protein family. Protein charges were calculated using the module “pICalculator” from the Bioperl toolkit . 6 titratable amino acids were included: aspartate (Asp), glutamate (Glu), histidine (His), tyrosine (Tyr), lysine (Lys), and arginine (Arg); pKa values were assigned as described previously in the Emboss pKa set : 3.9, 4.1, 6.5, 10.1, 10.8, and 12.5, respectively. The N- and C- termini had a pKa of 8.6 and 3.6 respectively. Cysteine (Cys) was treated as a nontitratable residue because sequence-based methods are unable to distinguish between free cysteines and cysteines that are part of disulfide bridges. For 112 α/β hydrolases with experimentally determined structures, at least 65% of all cysteines were found to be part of disulfide bridges (data not shown). This number is supposed to be even higher because not all disulfide bridges are properly annotated in the structure entries. Previously it was found that 91% of all cysteines were part of disulfide bridges in over 50 analyzed proteins .In order to validate the accuracy of predictions calculated with the Emboss pKa set, a comparison of this set, a more recent pKa set , and a structure based method (PDB2PQR/PROPKA ) was performed. 25 proteins with resolved crystal structures were randomly chosen from the data set, and the amino acid sequences used for all calculations were extracted from the crystal structure file. For pH values between 1 to 14, the total charge of the proteins was calculated as the sum of the partial charges of each titratable group. The comparison demonstrated that for the sequence based methods the deviation between the predicted IER and pI values were less than 0.3 and 0.4, respectively (). The deviation between the Emboss pKa set and the structure based approach using PDB2PQR/PROPKA was less than 0.6 for the IER and 0.8 for the pI (, ). […]

Pipeline specifications

Software tools BioPerl, EMBOSS, PDB2PQR, PROPKA
Databases LED
Application Protein structure analysis