Computational protocol: Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry

Similar protocols

Protocol publication

[…] The percentage aa accessibility is defined as the percent ratio of the solvent-accessible surface area (SASA) of the side-chain X in the protein to the SASA of X in the tripeptide, –Gly–X–Gly–. As in previous studies (), an aa with a relative SASA >5% is considered accessible for interacting with RNA, whereas that with a relative SASA ≤5% is deemed buried and inaccessible to a RNA molecule. The MOLMOL () program was used to compute the relative SASA of each aa from the protein structure using a solvent probe radius of 1.4 Å. [...] Each residue was assigned an ‘electrostatic rank’ (denoted as Rankelei) based on whether it and its surrounding residues became electrostatically stabilized upon mutation to Asp−/Glu−. Thus, given the 3D structure of a l-residue RNA-binding protein, l mutant structures were generated by mutating each wild-type aa to Asp−/Glu− depending on its size and shape. Ala, Asn, Asp, Cys, Gly, Ser, Thr, or Val were mutated to Asp−, while the other residues were mutated to Glu−. The side chain replacements were carried out using the SCWRL () program, which identifies the most common side-chain χ1 and χ2 angles for the mutant Asp−/Glu− residue corresponding to the backbone ϕ and ψ angles of the wild-type residue at that position. Each mutant structure was then energy minimized with heavy constraints on all non-hydrogen atoms using the AMBER () program to relieve bad contacts.Having generated the l mutant structures, the gas-phase electrostatic energy of the wild-type () or mutant () protein in the folded state relative to that in an extended reference state ( or ) was computed. In this extended reference state, the residues do not interact with one another, hence the electrostatic energy of the wild-type () or mutant () unfolded protein is simply the sum of the individual residue energies, and their difference is equal to the difference between the electrostatic energies of the native residue at position i () and the corresponding mutant Asp−/Glu− residue (). Thus, the change in the gas-phase electrostatic energy upon mutating aa i to Asp−/Glu− is given by: 1 A negative ΔΔEeleci implies that aa i is electrostatically stabilized upon mutation to an Asp−/Glu−. The gas-phase electrostatic energies were computed with the all-hydrogen-atom AMBER force field () with ε = 1 using the AMBER () program.Knowing , the average electrostatic energy change of aa i and its surrounding, <ΔΔEelec>i was computed from: 2 where the summation in Equation () is over residues, which include aa i and all residues j whose Cα atoms are within 10 Å of the Cα atom of aa i. The l <ΔΔEelec>i values were then ordered from the most negative to the least negative/most positive and used to rank the l residues from 1 to 10 such that residues with the top 10% most negative <ΔΔEelec>i values were ranked 1, residues with the next 10% most negative <ΔΔEelec>i values were ranked 2, etc. (Supplementary Table S1). [...] Each residue was also assigned a ‘conservation rank’ (denoted as ) based on the relative conservation of the residue and its surrounding residues. For residue at position i in a given RNA-binding protein, a conservation score, Ci, was computed by the ConSurf program version 3.0 (,). The Ci score reflects the evolutionary rate of the residue at position i in the phylogenetic tree generated on the basis of a protein's homologous sequences. The Ci score is an integer number, ranging from 1 to 9, with 1 indicating a rapidly evolving and thus variable residue at position i, whereas 9, a slowly evolving, conserved residue.Knowing the Ci values, the average conservation of aa i and its surrounding, i, was computed from: 3 where the summation in Equation () is over aa i and all residues j whose Cα atoms are within 10 Å of the Cα atom of aa i. Residues were then ranked from 1 to 10 such that residues with the top 10% largest i values were ranked 1, residues with the next 10% largest i values were ranked 2, etc. (Supplementary Table S1). [...] Given the 3D protein structure, the 10 largest clefts (comprising cavities and grooves) were found using the SURFNET program (). The SURFNET algorithm detects clefts; i.e. gap regions, by fitting spheres into spaces between any two atoms [see () for details]. If any atom of a residue was assigned as a constituent of the cleft by the SURFNET program, then this residue was regarded as a component of the cleft. When atoms of a residue were assigned to two different clefts, the residue was assigned to the larger of the two clefts. Residues constituting a given cleft were removed if their overall Rank is >5. Clefts with 10 or more solvent accessible residues were considered as RNA-binding site candidates. However, if <3 cleft candidates were found, the minimum number of surface residues in the cleft was reduced by one successively until three or more candidates were found. […]

Pipeline specifications