Computational protocol: Iterative refinement of structure-based sequence alignments by Seed Extension

Similar protocols

Protocol publication

[…] We used the CDD (v.2.07) "root node set" introduced in our previous work [] as the reference sequence alignments with corresponding SCOP domains. We chose this dataset because it is manually procured and because it includes many sequences that are sufficiently dissimilar that structure is needed for their accurate alignment. The 5 'outlier' superfamilies (cd00651, cd01345, cd02156, cd01284, and cd02688) were excluded, for which the CDD alignments were judged questionable as reference alignments []. The composition of the dataset is described in Table .We included CE (Algorithm 1.0, Alignment calculator 1.02) [], DaliLite_2.4.1 [], LOCK2 [], FAST [], MATRAS (version 1.2) [], MATT [], SHEBA-4.0 [], TM-align [] and VAST (directly from Dr. Gibrat) []. We also included SSEARCH from FASTA3 package for pure sequence alignment [] and SALIGN from Modeller (mod9v6) for profile-profile alignment []. The input multiple alignments for SALIGN were prepared from PSI-BLAST alignments (BLASTPGP [] in blast-2.2.20 package), allowing up to 20 iterations with e-value cutoff of 0.0005 against nr database (as of 04/19/2009). Up to 1,000 sequences with most significant e-values were retained in the multiple sequence alignment. The parameter settings for PSI-BLAST were as described in Marti-Renom et.al. []. Otherwise, default values were used for all the programs.In order to measure the execution times for the methods including the RSE procedure, time-stamps were recorded before and after system calls for the executables. For the CPU times per refinement cycle with CDD alignments, the elapsed time from after the initial structure superposition to the end of refinement cycles, which did not include the file I/O time, was divided by the number of refinement cycles. The CPU times for each alignment were averaged over three independent runs. […]

Pipeline specifications