Computational protocol: Particle alignment reliability in single particle electron cryomicroscopy: a general approach

Similar protocols

Protocol publication

[…] The objective of this work is to provide a statistical analysis, without using tilt-pairs, with the capability to provide objective information about the consistency between the reconstructed 3D map and a set of 2D projections (or 2D classes) used in the map reconstruction process. Our work is based on studying, for each experimental projection image, the weighted orientation distribution of their corresponding most similar map projections, according to a significant value. Note that these map projected images are obtained projecting the volume into a regular angular grid. We will refer to the map projected images in the following parts of this paper as reference images. Moreover, the similarity metric needed to quantify the likeness between the projection and reference images can be the well-known normalized correlation or any other different similarity metric, such as IMED, for example. In this work, following, we have used a probabilistic approach for the similarity or weight calculation between the projection and reference images, but other methods, such as RELION could also be used.In order to analyze if a projection image is consistent with a 3D density map, we can study the weighted orientation dispersion, or clustering tendency, of their corresponding most similar reference images, according to a significant value. As illustrative examples, we may think first of a case where for each experimental projection image its most similar reference set is characterized by a very clustered angular spread, with a weight -or similarity value- that is very high at the cluster center and which decreases smoothly and quickly as we move away from this center. In this situation, the orientation determination of the projection images should be very reliable, and the final computed 3D map will likely be correct until a certain resolution value, that depends on the number and the angular distribution of the projected images that have entered into its calculation. At the other extreme, we may think of a situation in which for each projected image, the weighted orientation angular distribution of the set of most similar references (according to a certain significant value) is completely scattered. In that case we would be unable to assign a reliable orientation to each projection image, and the final 3D map could not possibly be correct.In order to quantify these two critical scenarios, and cases in between, we have used a weighted clustering tendency parameter, inspired on the Hopkins statistic parameter. The Hopkins statistic parameter of a set of M points (S) is defined as. where αi is the distance between point i and its respective closest point belonging to point set S, and ui is the distance between point i and its respective closest point belonging to an equivalent uniform random distribution over the projection sphere and composed of the same number of points as point set S.This statistic examines whether objects in a dataset differ significantly from the assumption that they are uniformly distributed in a multidimensional space. Observe, that H provides us with important clustering information. A value close to zero means tightly clustered point set, while close to 0.5 corresponds to no clustering. In our case, we have to analyse the clustering tendency of points distributed over the unit sphere (orientation distribution). In we show an example of a clustered orientation distribution over the unit sphere (a) and a not clustered one (b). In this work, we use the geodesic distance over the unit sphere as the metric to obtain the distance from one point i to its closest one, given as and for an equivalent uniform randomly distributed point set where and are an arbitrary point and its closest one, both belong to S, while and have the same meaning as and , but belonging to a uniform random distribution on the projection sphere. Note that these vectors are unitary. Observe that in Eq. and Eq. there is no information about the weight or similarity value distribution. As explained above, it is desirable that this similarity or weight distribution is also structured; in other words, we want the closest points to have high and alike similarity values. Therefore, we define a weighted version of Eq. and Eq. as with and being the similarity or weight values of the corresponding i and j orientations. Note that in Eq. we are using the same similarity values as in Eq. . Note that introducing weights in Expressions (4) and (5) is a key issue in order to provide strong robustness against noise to our alignment evaluation approach (please, see for further information). Finally, we can define a weighted clustering tendency parameter as from Eq. we can obtain a weighted clustering tendency parameter when the points are uniformly random distributed as where is a realization different from of the random process that scatters a set of projections on the unit sphere. From each projection image k, using Eq. and Eq. we can estimate their empirical distributions and by using a Monte Carlo sampler (in our case, we sample 100 times). From these distributions, we determine the corresponding cumulative density functions as From Expressions (8) and (9), we define the inverse cumulative density functions, and , which for each percentile give us the corresponding and values. With these inverse density functions, we obtain, for each projection image, the clustering ratio as After we have computed for all the projection images using Eq. , we can define a map consistency or quality parameter Q as Using this volume consistency or alignment precision quality parameter Q, we establish that a volume is not reliable if , taking into account our alignment procedure, the significant value and angular sampling used, which are input parameters of the proposed approach. In our experiments, we chose when using typical values of significant value and angular sampling of about 0.05 and 10 degrees, respectively. Note that in tilt-pairs validation approach it is established as angular validity criteria that at least 60% of the particles must show a single cluster. This criteria implies a Q0 threshold of 0.8 (please see for further details). However, from all the tests performed in our work, we have checked that there is a large number of cases where the 3DEM map is correct, at least at medium resolution, with Q values close to 0.75. Therefore, we have decided to set the Q threshold value to Q0 = 0.75. presents a schematic flow diagram explaining how P/Q are calculated in practice. The inputs of the algorithm are the 3DEM map and the projection images used to reconstruct the map (or a smaller subset which sample uniformly the projection sphere). This input is firstly CTF-corrected by Wiener filtering, where the CTF is obtained using Xmipp method. Secondly, a projection image alignment process is done, using any method that assigns to each projection image the set of its most likely volume orientations (providing also the respective weights or likelihoods), according to a significant value. After this, and are obtained for each projection image using Eq. and Eq. and, then, Pk through Eq. . This process is repeated for all images and, finally, Q is calculated using Eq. . The proposed map alignment precision evaluation process should be performed with the same projection images used to reconstruct the map or, alternatively, using a smaller subset randomly selected which sample conveniently the projection sphere. Note that if the volume is reconstructed with raw projection images but the alignment evaluation approach is performed with class averages, the resultant P/Q values will be overestimated. These overestimated results are not mainly due to the higher SNR of the class averages with respect to the raw projection ones, but the explanation is because class averages have significantly higher spatial coherence than raw projection images, as the 2D/3D alignment process is never perfect. [...] GroEL is considered to be a very difficult case for blind initial map determination algorithms, as the top and side views have similar size and it is difficult to automatically decide which is the side and which the top view. Indeed these methods may get stucked into a local minimum, providing wrong 3D low resolution maps. We used the GroEL dataset publicly available as the tutorial of EMAN2 (http://blake.bcm.edu/emanwiki/Ws2011/Eman2), composed by 26 micrographs of size 4082 × 6278 pixels. The sampling rate was 2.10 Å/pixel and the microscope voltage 200 kV. From this dataset, we detected 4,123 particles of size 128 × 128 px, using the methods presented in and, and 16 classes were determined using CL2D. After this processing, we used RANSAC initial map determination approach, which provided us with ten different maps. From this initial map set, we picked up two, one that clearly was not a correct initial map for GroEL, and another one that appeared to be correct (up to a certain spatial resolution). After that, we randomly selected a projection image subset composed by 1000 images and ran the proposed evaluation approach using the two selected initial maps. As input parameters of the proposed method we used a significant value of 0.05 and an angular sampling of 5 degrees. In , we show some of the 2D projection images that have been used (a) and the two maps obtained by RANSAC, the “correct” one (b) and the “incorrect” one (c). From , it is very clear that map (b) is a “correct” one, while (c) is not.We have run the proposed evaluation approach with these 2D projection images and maps; in we show the Pk obtained for the “correct” volume (solid black curve) and for the “incorrect” one (dashed gray curve). Additionally, we have obtained quality parameters Q of 0.70 and 0.82 for the “incorrect” and “correct” models, respectively. Observe that the “correct” volume has a Q value higher than our threshold acceptance value while the “incorrect” one has a lower value. […]

Pipeline specifications

Software tools RELION, Xmipp, EMAN
Application cryo-EM