Computational protocol: Clustering procedures for the optimal selection of data sets from multiple crystals in macromolecular crystallography

Similar protocols

Protocol publication

[…] BLEND can be executed in two modes: analysis and synthesis. Program execution is regulated by keywords included in an ASCII file. Input data are unmerged reflection files in MTZ or XDS format. The final output is a number of merged reflection files in MTZ format.Initially, BLEND is run in analysis mode (option -a). The input is a list of data sets to be processed or the path to the directory containing the data. The program checks each data set in turn, making sure that it is properly formatted and includes integrated data taken in a continuous sweep. Data sets failing this check are discarded and an updated file list is stored as a new ASCII file.Next, it applies the unit-cell parameter filtering of §2.2 and performs the cluster analysis (§2.1). The output of the analysis mode is a cluster dendrogram and an associated ASCII file providing details of the clusters. Information obtained during execution in analysis mode is also dumped to a binary file, ready to be read by the program when running in synthesis mode. Users should both look at the dendrogram and read through its ASCII counterpart to decide which clusters of data sets are worth further investigation.Once one or more clusters have been singled out, BLEND is run in synthesis mode (option -s). It will suffice to provide one or two numerical levels corresponding to cluster-height values (see §2.1 and Figure 1). The program will accordingly scale and merge together all data sets at each node included in the specific dendrogram region. Scaling and merging are carried out using the CCP4 programs POINTLESS and AIMLESS (Evans, 2006).Execution in synthesis mode can be repeated with different numerical level values as many times as required. Specific keywords for either POINTLESS or AIMLESS can be added in BLEND keywords files so that execution in synthesis mode can be customized. BLEND can also be executed in combination mode (option -c) when the user wishes to combine data sets outside an existing cluster. This mode is particularly useful when reference data sets need to be included to stabilize the scaling of many incomplete ones, a practice that is quite often followed by protein crystallographers. Unmerged files produced by POINTLESS containing the most likely space group, or one selected by the user, are also stored, ready to be used for prospective and individual scaling jobs, separately from BLEND. BLEND requires only minimal and simple intervention from the user, but also permits both analysis and synthesis modes to be tailored to specific needs should the user desire. […]

Pipeline specifications

Software tools XDS, CCP4
Applications Small-angle scattering, Protein structure analysis
Diseases Elephantiasis