Computational protocol: Characterization of Disease-Associated Mutations in Human Transmembrane Proteins

Similar protocols

Protocol publication

[…] Genetic variations for the 4998 human α-helical TMPs as well as for the 14586 non-TMPs were imported from the UniProt database. The unclassified mutations were excluded from the analysis. All topology and variation data for TMPs were converted to the standard Bed format [], using the UniProt ID and the position of the variant and were inserted before the original columns of the annotation. Ambiguous variations associated with multiple diseases were removed. In the case of multi-pass membrane proteins, we distinguished the terminals (regions before the first and after the last TM segments) from the loop regions and added this information to the converted files (e.g., N-terminal, Loop, C-terminal). The overlaps between the variations and the different segments of topology were determined by the intersectBed program, with the option—wo, from the Bedtools software package version v2.17.0 []. A step by step description of these preanalytical steps can be found in the . Using the original UniProt annotation of these variations within the different topological sites, the exact amino acid substitutions and the grouping by the polar/non-polar/charged protein property were counted by a Perl script. We counted asparagine, glutamine, serine, threonine and tyrosine as polar residues; alanine, cysteine, glycine, isoleucine, leucine, metionine, phenylalanin, proline, tryptophan and valine as non-polar residues; arginine, aspartic acid, glutamic acid, histidine, and lysine as charged residues. To estimate the standard deviation of the distribution of the various substitution types, we applied a bootstrap method by selecting the 90% of variations from the disease associated and polymorphism groups by chance for ten times, and the mean and standard deviation values from the ten cases were calculated. The significance of the observed amino acid substitution matrix for the different topological sites was tested. In the case of the three topological sites (inside, membrane, outside) positions were randomly chosen for every observed mutant sites from amino acid sequences located within those regions. The observed substitution rate was used to construct a random substitution matrix for the amino acid changes. This method was applied a hundred times and the average and standard deviation values were determined to all substitutions, then the significance of the observed values was examined. The distance of the glycine to arginine variations from the center of the transmembrane region was computed by a Perl script. The enrichment analyses were determined by the WebGestalt web service [], using the default options, and the hsapiens__entrezgene_protein-coding reference set. The EMBOSS software package version 6.3.1 was used to manipulate the raw protein sequences, and to obtain the information of protein sequences []. The perl scripts can be downloaded from the following web page: the investigation of the distribution of mutations in the 3D structures of TMPs, the polymorphisms and disease associated mutations were mapped onto the 3D structures of TMPs; the membrane normal was parallel with the z-axis and the zero point was in the middle of the double lipid layer. The information for the necessary rotation was taken from the PDBTM database []. The proteins were cut into 1Å wide slices parallel to the membrane plane, and the number of polymorphisms and disease associated mutations as well as the number of all residues were summed for each TMP having homologous structure in PDBTM database. The relative frequencies of mutations were calculated by dividing the sums by the sum of all residues in each slice. […]

Pipeline specifications

Software tools BEDTools, WebGestalt, EMBOSS
Databases PDBTM
Application Membrane protein analysis
Organisms Homo sapiens
Diseases Mitochondrial Diseases
Chemicals Glycine