Similar protocols

Protocol publication

[…] ompensatory mutation (). We used correlated mutation analysis (CMA) based on mutual information to detect co-evolved pairs that may be functionally or structurally important (). A pipeline that implemented a linear discriminant analysis combining pHMMs of the extracted segments and correlation scores by CMA discriminated R-4-A, R-4-C, R-2-X and other types with high accuracy., Amino acid sequences of plant type III PKSs with experimentally characterized reactions were retrieved from the GenBank database (). Among the 111 sequences obtained, 82 representative PKS sequences were selected by keeping the sequence identity below 90% between sequences belonging to the same reaction type using the CD-HIT program (). They consisted of 13 R-4-A, 27 R-4-C, nine R-2-X and 33 other type (four Rn-2-n/Rn-4-Cn, two R-4-C R-2-X bifunctional, three R-*-L, 12 S-*-*, six L-*-A and six Lh-4-L, where ‘*’ means any possible element) PKSs. The reaction types were assigned based on the main or representative reactions. In this research, R-2d-X was included in R-2-X. The representative PKSs, their GenBank accessions and reaction types are listed in ., Multiple sequence alignment (MSA) of protein sequences for each or all of the four reaction types was performed by MAFFT 7 () with the L-INS-i option. Alignment between a query sequence and an existing alignment was performed by MAFFT with the –add option. A phylogenetic tree of representative plant type III PKSs () was constructed by the maximum-likelihood method in FastTree (), with highly gapped positions trimmed by trimAl () and a bacterial type III PKS, Streptomyces griseus RppA, used as the outgroup., The amino acid positions in Medicago sativa CHS2 (R-4-C) exhibiting different 3D structure from Pinus sylvestris STS (R-4-A) were retrieved as described in . These positions consisted of four discrete segments called Areas 1, 2, 3 and 4. The residue positions included in these Areas were obtained from the MSA. pHMMs of combinations of these Areas were constructed by HMMER 3 () after concatenating them. The reaction types were predicted by the pHMM that exhibited the highest score among the pHMMs and the accuracy of the predictions was tested by leave-one-out cross-validation (LOOCV) and repeated random sub-sampling validation (RRSV). In the RRSV, 50% (the decimal was rounded up) of a dataset for one reaction type were randomly picked up as a training set and the other 50% were used as the test set. This process was repeated 10 times for each dataset and their average accuracy was calculated., To determine the feasibility of reaction type prediction using HMM scores, principal component analysis (PCA) on standardized HMM scores was performed using the pr […]

Pipeline specifications

Software tools CD-HIT, MAFFT, FastTree, trimAl, HMMER