Computational protocol: Identification of Preferred DNA-Binding Sites for the Thermus thermophilus Transcriptional Regulator SbtR by the Combinatorial Approach REPSA

Similar protocols

Protocol publication

[…] Sequence data in fastq format was first processed using a script provided by Ying Xie (Computer Science, Kennesaw State University). This script (Sequence1.java) takes information from an accompanying file (Parameter.txt) to identify those sequences that have intact ST2R/ST2L flanks and a proper insert length of 24 bases. It then strips extraneous information (ST2R/ST2L flanking sequences, quality value information), rendering the sequences in a format amenable for further analysis. Identical and unique sequences were identified in this file using DuplicatesFinder v1.1 (http://proline.bic.nus.edu.sg/~asif/tools/DuplicateFinder.zip). Bioinformatic analysis of REPSA sequences was performed using the MEME Suite of software (v4.10.2) via their website (http://meme-suite.org/) []. Multiple Em for Motif Elicitation (MEME) was used to generate sequence logos for the top three motifs identified in a population of 1000 sequences, the limit for this software. Default options were used except for the restriction to palindromes, where indicated. Top motifs were then submitted to Find Individual Motif Occurrences (FIMO) to identify their best matches within the T. thermophilus HB8 genome. The top matched sequences identified by FIMO were then investigated in the context of their genomic sequences. Sequences ±300 bp of the FIMO matched sequence were analyzed online by both Softberry BPROM (http://www.softberry.com/) and University of Groningen Genome2D (http://genome2d.molgenrug.nl/) to identify potential promoter elements [,]. Operons were identified using databases at the University of Georgia (DOOR2, http://csbl.bmb.uga.edu/DOOR/) and the Universidad Nacional Autónoma de México (ProOpDB, http://operons.ibt.unam.mx/OperonPredictor/) [,]. Information on identified/postulated protein functions for potential SbtR-regulated genes was obtained from both the KEGG Genome database (Thermus thermophilus HB8) and the UniProt Knowledgebase database (Thermus thermophilus strain HB8 / ATCC 27634 / DSM 579) [,]. […]

Pipeline specifications

Software tools MEME Suite, MEME, BPROM, Genome2D
Databases DOOR ProOpDB UniProtKB KEGG
Application Genome annotation
Organisms Thermus thermophilus, Thermus thermophilus HB8