Computational protocol: Exploring the speed and performance of molecular replacement with AMPLE using QUARK ab initio protein models

Similar protocols

Protocol publication

[…] For each sequence, QUARK (Xu & Zhang, 2012) produced 5000 individual structures. In the terminology of ab initio modelling, these low-resolution, rapidly obtained predictions are known as decoys. Briefly, the software first generated a set of structural fragments with lengths from one to 20 amino acids at each position of the query sequence. These fragments were used to assemble the ab initio models by replica-exchange Monte Carlo (REMC) simulations under the guide of a generic, atomic-level knowledge-based force field with consideration of various sequence-specific predicted structural features, including secondary-structure type, solvent accessibility and β-turn propensity. For each query, QUARK ran ten independent REMC simulations starting from different random numbers. In each run, 50 decoys were selected from each of the ten low-temperature trajectories, resulting in 5000 decoys. The decoys lacked the explicit side chains that a full, CPU-intensive modelling would add. PDB structures with a sequence identity of >30% to the target or detectable by PSI-BLAST (a criterion used by most of the ab initio folding benchmark tests; Zhang et al., 2003; Simons et al., 2001) were excluded from the fragment library.Processing of decoys into search models used the AMPLE pipeline (Bibby et al., 2012). Briefly, decoys were clustered using SPICKER (Zhang & Skolnick, 2004) and centroid structures representing the three largest clusters were used to generate ensembles containing structural neighbours. Side chains were added to the ensembles using SCRWL (Canutescu et al., 2003; Krivov et al., 2009). The structural diversity within each ensemble predicts the deviation from the true structure (Qian et al., 2007; Bibby et al., 2012), and therefore the variance along the chain was quantified with THESEUS (Theobald & Wuttke, 2006) and used to derive up to 20 progressively more truncated versions of each ensemble. A sub-clustering step, collecting up to 30 near-centroid structures at 1, 2 and 3 Å r.m.s.d. thresholds, produced more structurally homogeneous ensembles from these results. After treatment of side chains in three different ways (all retained; only more easily predicted side chains retained and others trimmed to polyalanine; all trimmed to polyalanine) these subclusters became the set of search models. Hundreds of distinct search models may be produced for a single target. They are dealt with by MrBUMP (Keegan & Winn, 2008). In this work, only Phaser (McCoy et al., 2005, 2007; Storoni et al., 2004) was used for MR solution. Automated density modification and main-chain tracing with SHELXE (Sheldrick, 2008; Usón et al., 2007; Thorn & Sheldrick, 2013) was used to detect successful solutions as having a CC value of ≥25 and a mean traced chain-fragment length of ≥10. For Phaser, default parameters were used with the exception of the estimated r.m.s.d. error (see below). For SHELXE, the following options were used: 15 cycles of autotracing (-a15), searching for α-helices (-q), pruning for optimization of the CC for the input model (-o), the time factor for the helix and peptide search (-t3) and the ‘free-lunch’ option to add missing data up to 1.0 Å resolution if the data resolution was 1.8 Å or better (-e1.0). All other options were set to their default values. Mean phase error (MPE) values were calculated using CPHASEMATCH from the CCP4 suite (Winn et al., 2011). Here, focusing on overall success rates, AMPLE terminated after finding the first success.For comparison with published data, QUARK-derived search models were run with Phaser 2.3.0 and SHELXE 2012 (Run 1). The estimated r.m.s.d. error of the search models was set to 0.1 Å, as previously (Bibby et al., 2012), or to 1.2 Å (Run 2 alone). QUARK-derived search models were also run with Phaser 2.5.4 and SHELXE 2013 (Run 3). Since MR is typically the slowest step in AMPLE, a requested time limit of 5 min for Phaser (KILL TIME 300 flag) was also tested (Run 4). In practice, since Phaser is only terminated at certain points in the code, this most commonly stops Phaser after 10–20 min. […]

Pipeline specifications

Software tools BLASTP, SHELX
Application Protein structure analysis