Computational protocol: Protruding knob-like proteins violate local symmetries in an icosahedral marine virus

Similar protocols

Protocol publication

[…] Particles in various orientations were selected automatically using the swarm module in EMAN2 (ref. ); the false-positive particles were deleted manually. This produced an initial data set of 18,000 particles. The contrast transfer function parameters for each CCD image were manually determined using ctfit in EMAN1 (ref. ). An initial model was built from a small data set of 1,000 particles by assigning random orientations in multi-path simulated annealing. The particle orientations were refined at an increasing resolution limit starting from 50 Å up to 10 Å. An iterative refinement was done until convergence to obtain the final map from ~12,000 particles. An FSC plot was obtained between the two maps generated from randomly split even/odd data sets. This FSC plot is called FSCdata.To validate the map resolution and assess any noise overfitting during refinement, the method of HR noise substitution was used, here the results are shown in . For this, a second stack from the original experimental data set was generated, where data beyond 10 Å was removed by randomizing the phases. These HR noise-substituted data were then subjected to the identical protocol of 3D reconstruction as mentioned above for the experimental data. An FSC plot was obtained between the two maps generated from the randomly split even/odd HR noise data sets. This FSC plot is called FSCnoise. In the HR noise-substituted data, the FSC drops significantly to zero past 10 Å, beyond which the data were substituted with noise, showing no significant noise overfitting (shaded blue area). An FSCtrue (black solid line) was plotted by calculating the relative error between the FSCdata (pink dotted curve) and FSCnoise (blue dotted curve), as described previously. The true data with no overfitting are shaded pink in . The FSCtrue plot was used to estimate the resolution of the final map to be 4.7 Å at FSC=0.143. We applied experimentally determined structure factors to the map for sharpening, limited to the reported resolution limit of 4.7 Å. [...] UCSF Chimera was used for map visualization, analysis and generation of the molecular graphics images. The segmentation of the densities corresponding to the major capsid protein and the outer capsid proteins were done using Chimera and Avizo ( To generate an average of the six-hexameric subunits in one asymmetric unit for model building purposes, their corresponding densities were aligned in Foldhunter program, while an average was calculated by proc3d in EMAN1. A pairwise FSC was calculated between the computationally segmented seven subunits in an asymmetric unit of the icosahedral map, where no symmetry is applied, to measure the correlation among the gp39 subunits within one asymmetric unit. [...] Various bioinformatic tools were used to analyse the sequences of gp39, gp55, gp57 and gp58 proteins. For the multiple sequence alignment and secondary structure prediction, PSIPRED and Jpred servers were used, while the physical and chemical parameters such as molecular weight, amino-acid composition, instability index, hydrophobicity and so on were calculated using ProtParam and PredictProtein servers.The knob proteins gp55/58 being farthest from the centre (highest alignment errors) are poorly resolved compared to the major capsid proteins, hence we have not built model for these proteins. Moreover, the capsid surface of Syn5 is thin and smoother as seen in compared with other known phage structures such as ε15 and P22, hence fewer features to align at the extreme radius of the capsid shell. However, we were able to localize major SSEs using SSEHunter in the map densities of gp55/gp58. Also, our analysis hints that the protruding density gp58 found at the opening of the hexamer is composed of two polypeptide chains. [...] For model building, each of the seven individual gp39 subunits from one asymmetric subunit were cropped out of the full map using UCSF’s Chimera. Individual gp39s were aligned with Foldhunter and then averaged using proc3d, both of which are available in EMAN1 (ref. ). Using the initial averaged gp39 density as a template, a second round of segmentation, alignment and averaging resulted in a final average gp39 subunit.SSE identification was then performed on the averaged gp39 subunit using SSEHunter in Gorgon51. Five helices and two β-sheets were identified and corresponded to those found in capsid proteins of other tailed dsDNA bacteriophages, such as gp5 in HK97 (ref. ). In addition, a density skeleton was computed that revealed the topological linkages between the observed SSEs. Jpred 3.0 (ref. ) was then used to predict the secondary structure from the sequence, also revealing five helices and several beta strands.Using Gorgon, an initial topology for gp39 was constructed by establishing a sequence to structure correspondence between the predicted and observed SSEs using the density skeleton as a constraint. From this topology, a Cα backbone model was then constructed using Gorgon’s semi-automated model building tools. Briefly, Cα backbone α-helices were first constructed in the density at the positions found by SSEHunter using the Helix editor function in the ‘semi-automatic atom placement’ utility in Gorgon. Loops between the α-helices were then built using Atom editor and Position editor functions in the ‘semi-automatic atom placement’ utility in Gorgon, which allows the user to place individual Cα backbone atoms along the density skeleton at a given spacing (~3.8 Å for Cα–Cα distances). Model building proceeded until the entire sequence of gp39 was placed within the density. Manual refinement of atom position was done interactively in Gorgon to remove any potential clashes and correct bad Cα–Cα distances. The final model was saved as a PDB file.To validate the model, we then used our Pathwalking protocol to determine whether the solution found in Gorgon was unique. The initial Cα positions were iteratively perturbed (sigma=1) using such that 100 potential model paths were computed with Pathwalker. For calculating these paths, the LKH TSP solver was used. Results were examined and compared in UCSF’s Chimera. A small amount of noise was added (sigma=1) to the positions of the initial Cα model using from EMAN2. One hundred potential model paths were then computed using and then compared in UCSF’s Chimera. In each case, the pathwalking model resulted in a continuous chain trace through the density map without any visible density crossovers. Topologically, all the models appeared similar with some differences occurring in the first ~25 amino acids. For the purposes of the remaining modelling, the first 25 amino acids were truncated from the model. Manual refinement of Cα positions was done interactively in Gorgon to correct bad Cα–Cα distances. In addition, COOT was used to remove clashes within and between subunits in the asymmetric unit. The final model was saved as a PDB file. […]

Pipeline specifications

Software tools EMAN, UCSF Chimera
Application cryo-EM
Diseases Protein Deficiency, HIV Infections
Chemicals Carbon