Computational protocol: Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

Similar protocols

Protocol publication

[…] DisProt database release 5.6 (http://www.disprot.org/) provides a set of proteins with different degree of disorderness . It gives the name of the protein, accession codes, aa sequence, location of the disordered region(s), and methods used for structural (disorder) characterization. DisProt analysis also reveals biological function(s) of each disordered regions. Sequences of each protein were retrieved in FASTA format. Length, the aa composition, residue characteristics such as total number of positive and negative residues and theoretical isoelectric point (PI) were computed using the ProtParam tool of ExPASy Proteomic server (http://us.expasy.org/tools/protparam.html). The total charge of the proteins was calculated by ‘protein calculator’ server (http://www.scripps.edu/~cdputnam/protcalc.html).Additional disordered proteins were selected from IDEAL data set that contained experimentally verified IDPs . The structural disorder of the proteins was varied from 0 to 100%. The proteins with (−1)% disorder were excluded. Structural disorder was further calculated using IUPred algorithm, which is available at http://iupred.enzim.hu . Protein disorderness was estimated by counting the number of residues in disordered regions in a protein as predicted by IUPred and it was divided by the length of the protein sequence followed by multiplication with 100. [...] Protein sequences obtained from DisProt and IDEAL were used to calculate both the LCR and AR. The content of LCR of an individual protein was predicted by SEG method as implemented in SMART (simple modular architecture research tool) , , a web based server available at http://www.bork.embl-heidelberg.de/Modules/sinput.shtml. Default SEG parameters were used for finding the LCR. The SEG method detects LCRs based on the measurement of information content present in the complexity state vector . The ratio of total number of aa residues in all the LCRs of a protein to the protein sequence length was used to calculate the content of low-complexity region in a particular protein. Amyloidogenic region of the proteins was identified by a web based computational tool Waltz , http://waltz.switchlab.org. The % content of residues in AR in a protein was measured by taking a ratio of sequences in all the ARs and the sequence length of the protein. [...] APSSP2 was used for the secondary structure prediction of each protein from their aa sequence . The algorithm uses a sequence of amino acids as a query input and predicts the corresponding secondary structure with certain confidence level. Percentages of residues those prefer to be in α-helix, β-strand and coiled conformation were calculated by taking a ratio of total residues in a particular conformation to the sequence length of the proteins. Structural preferences of the residues in ARs and LCRs were obtained by selecting the respective sequence regions in the predicted structure of the protein. Percentage of AR/LCR sequence with a preference for a particular conformation was measured against the total number of AR/LCR sequence in the protein. […]

Pipeline specifications

Software tools ProtParam, IUPred, Waltz, APSSP2
Databases DisProt SMART ExPASy
Applications Protein structure analysis, Protein physicochemical analysis
Organisms Homo sapiens
Diseases Crohn Disease