Computational protocol: Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos

Similar protocols

Protocol publication

[…] In order to attack the problem of visualizing and detecting putative tRNA identity determinants and antideterminants with function logos and inverse function logos, we analyzed a dataset of 655 nonredundant inferred and actual tDNAs from bacteria called the Modified Sprinzl tRNA Database (MSDB) which also forms the basis of a recently introduced automated classifier of tRNA identity called TFAM (). TFAM relies on what we call ‘profile contrast models’ that are built from the MSDB sequences, that are annotated with good confidence to belong to one of 21 functional classes. That is, in addition to the 20 canonical charging identities, the MSDB contains a separate model for initiator tRNAs. Thus, we have generalized the identity element concept slightly to include initiators as a separate functional class.In constructing the MSDB, redundant sequences—whatever their possible origin—are removed and certain identity classifications are corrected as detailed in (). tRNAs in the MSDB come primarily from bacteria related to E.coli and Bacillus subtilis, i.e. γ-proteo-bacteria and low-GC gram-positives, although more distantly related bacteria are also represented. Also described are that the identity annotations for Isoleucine versus Methionine and Alanine versus Valine identity are less reliable than for the other identity classes. Because tRNAs have such highly conserved secondary and tertiary structures, structurally analogous positions within each sequence may be assigned with high confidence, corresponding to a structural alignment that may be indexed in various ways. One such structurally based positional index is curated in the Sprinzl database (–) (), resulting in a widely used coordinate numbering system that can be applied to nearly any tRNA. MSDB uses automatically generated structural alignments of tRNAs made by COVEA () from a curated Stochastic Context-Free Grammar model of tRNAs (). The MSDB alignment is provided in Supplementary Data.We applied our new definitions of the function logo and inverse function logo to the MSDB. Denoting the possible identity classes by either the one-letter IUPAC amino acid codes (corresponding to their amino acid charging identities) or by the letter ‘X’ (indicating initiator tRNA identity) the class sizes in the MSDB are A = 52, C = 13, D = 21, E = 23, F = 22, G = 45, H = 15, I = 61, K = 22, L = 63, M = 14, N = 21, P = 28, Q = 19, R = 48, S = 53, T = 43, V = 26, W = 18, X = 28 and Y = 20.As in TFAM, the tRNAs in this dataset were automatically aligned by primary and secondary structural features using COVEA () with the prokaryotic tRNA SCFG model called ‘TRNA2-prok.cm’ that comes with tRNAscan-SE (). The length of this tDNA alignment was 106. After alignment, subalignments were partitioned off by the tRNA functional classes, and converted into profile matrices. Gaps are treated as a fifth state, so the sizes of these matrices are 5 × 106. These profile matrices are also provided in Supplementary Data.In interpreting our results we assigned as many of the 106 columns as possible a number from the standardized tRNA positional numbering system (–). We followed () in annotating an anticodon stem of 5 bp instead of 6 in our figures. Positional numbers in the text refer to Sprinzl coordinates. […]

Pipeline specifications

Software tools TFAM, tRNAscan-SE
Application Genome annotation