Computational protocol: Consensus Bayesian assessment of protein molecular mass from solution X-ray scattering data

Similar protocols

Protocol publication

[…] The Porod invariant, Qp, is an important characteristic of the scattering intensity and is defined as1Qp=∫s=0∞s2I(s)dswhere I(s) is the intensity at the momentum transfer s = 4π sin (θ)/λ, with λ being the wavelength of the X-ray and 2θ the scattering angle. Assuming that the particle has a uniform scattering length density its excluded volume, Vp can be obtained via2Vp=2π2I(0)QpHowever, the intensities at both limits of the integral in eq. () have to be determined via extrapolation. During data analysis, the I(0) can be determined using the Guinier approximation. The higher limit is evaluated up to sRg = 8 with an additional extrapolation to infinity. Finally, the mass estimate is obtained by dividing the volume by 1.37. This method, that we call MMQp is not the same as that implemented in DATPOROD of the ATSAS suite, which applies additional correction factors and yields otherwise worse MM estimates (see Supplementary Material ). [...] Bayesian inference is a way to infer the probabilities of potential values of an unknown quantity (hypotheses, H), by combining known pieces of information (evidence, E), by application of Bayes theorem. Here, we consider the MM of the protein to be the unknown quantity, and the MM estimates of the four concentration independent methods, MoW, Vc, Size&Shape and MMQp as source of information or evidence. We infer the probabilities of potential values of the MM (… H = 10 kDa, H = 11 kDa, … H = 99 kDa, H = 100 kDa, …) using Bayes theorem:6P(H=?kDa|EMMQpEVcEMoWESize&Shape)=P(EMMQp|H)P(EVc|H)P(EMoW|H)P(ESize&Shape|H)P(H)P(E)Therefore, the probability that the MM of the protein might be a certain value (for instance P(H = 10 kDa)), given the evidence of the estimates, is obtained by multiplying the respective probabilities of the observed evidence EMMQp, …, ESize&Shape given the hypothesis and a predefined starting probability for the hypothesis, the prior P(H). Here we take all possible MMs to be equally likely, and therefore the prior P(H) has a uniform distribution. The P(E) in eq. () is a normalizing term, the sum of the evidence distributions. This leaves the probabilities of the observed evidence P(E|H), in other words, converting a single MM to a probability distribution. This procedure is conceptualized in Fig. . Here, the actual MMs (as calculated by the program CRYSOL) that correspond to an estimate of MMMMQp = 50 kDa are coloured in red. These values are subsequently binned (see below) to produce a probability distribution (inset Fig. ). This procedure is repeated for all methods, yielding a total of four so called likelihood distributions (P(EMMQp|H) * P(EVc|H) * P(EMoW|H) * P(ESize&Shape|H)).Figure 2Of note, the outlined calculations have to be repeated for all hypotheses H individually. A plot of P(H|E) vs. H then yields the most likely MM, i.e. the Bayesian MM estimate, and a credible MM range i.e. the interval where the real value may, most likely, be found (Fig. ). […]

Pipeline specifications

Software tools ATSAS, CRYSOL
Applications Small-angle scattering, Protein structure analysis
Organisms Dipturus trachyderma