Computational protocol: Systematic identification of stem-loop containing sequence families in bacterial genomes

Similar protocols

Protocol publication

[…] In order to identify all family members of each cluster, a procedure was developed, based on cycles of alignment by PCMA and search on the genome by HMMER package tools []. First, SCRs of clusters regrouped by sequence (see Table ) were aligned by PCMA with option 'ave_grp_id' set to 50. The procedure can be summarized as it follows:1. The alignment is used to build a HMM by HMMBUILD and HMMCALIBRATE, with the default options.2. The produced HMM is used to search new elements within the genome, by using HMMSEARCH. E-value cut-off was set to 1E-10. Independent searches are run on each genomic sequence strand.3. Identified sequences are extracted and aligned to their parental HMM by HMMALIGN. Pairs of overlapping sequences on the opposite strands are avoided by discarding the one with the worse score and E-value.4. The aligned sequences are extended by 10% of the length of the parental HMM. Only the extensions are aligned by PCMA.5. The alignment of the extended sequences is then used for the construction of a new model, returning to step 1.The loop ends when one of the following criteria is met:• The detected sequences, which cover the entire model, are less than 7.• The new model is shorter in terms of length than the previous one.• The alignment does not extend the HMM any further (within a tolerance of 3 bp).• The alignment contains a number of gaps higher than 30% of the aligned bases.• The extreme value distribution, derived from the model calibration, is in the range Average_Score ± 3*Standard_Deviation, derived from HMMBUILD.The HMM and the final alignment are used as definition of the family. […]

Pipeline specifications

Software tools PCMA, HMMER
Application Protein structure analysis
Diseases Mitochondrial Diseases