Computational protocol: Complete Genome and Transcriptomes of Streptococcus parasanguinis FW213: Phylogenic Relations and Potential Virulence Mechanisms

Similar protocols

Protocol publication

[…] Genome sequencing was performed using the whole genome shotgun strategy . Briefly, total cellular DNA was mechanically sheared and end-repaired by using T4 DNA polymerase (NEB). 4 libraries containing sheared DNA fragments of various lengths (1.5 to 2 kb, 2 to 3 kb, 4 to 5 kb, and 6 kb) were constructed in pUC18. The nt sequences of the library inserts were determined by using the ET terminator chemistry on an ABI 3700 sequencer (Applied Biosystems) and a MegaBACE 1000 sequencer (Amersham Bioscience). Sequences were assembled and edited using PHRED, PHRAP and CONSED (http://www.phrap.org/phredphrapconsed.html). Gaps were closed by primer walking, long-distance PCR and optimized multiplex PCR . Sequences of the reads in low quality regions were resequenced to ensure the accuracy. We acquired usable shotgun-sequencing traces with an average length of 529 bp, resulting in an 8.84-fold sequence coverage. The complete genome sequence of S. parasanguinis FW213 has been deposited in the GenBank database with the accession number CP003122.The start point of the FW213 genome base numbering is set at the replication origin (oriC) which is identified by the GC-skew analysis and Ori-Finder software . ORFs were predicted initially with GLIMMER 2.0 at the default settings with a cutoff at 90 nt. Predicted ORFs were validated with translational start codon assignment based on protein homology and ribosomal binding motifs . The deduced aa sequence of each ORF was then BLASTP searched against the nonredundant database of GenBank and the “true proteins” (80% overlapping, E_value<1e−10) were extracted. The remaining ORFs and intergenic sequences were BLASTX searched against the nonredundant database and “true ORFs” (the same criteria as above) were identified. The problematic cases such as overlapping proteins were resolved according to the principle described previously , . The function of each protein is predicted by searching against the KEGG pathway database , the COG database and the InterPro protein family database , . Transfer RNAs were predicted with tRNAscan-SE , and ribosomal RNAs (rRNAs) were identified based on the similarity to the corresponding genes of other streptococcal genomes. The final annotation was manually inspected by integrating comprehensively the genome annotation and transcriptomic results to further refine the structure of the predicted genes and annotation. [...] Whole genome sequences alignments of the streptococcal strains were constructed by using the MUMmer package . The orthologs were identified by Inparanoid and MultiParanoid . The ClustalX software was used to align the concatenated sequences from all orthologs. The Artemis Comparison Tool (ACT) was used to view the overall comparison of S. parasanguinis FW213 and ATCC15912 genomes. […]

Pipeline specifications

Software tools Phrap, Glimmer, BLASTP, BLASTX, tRNAscan-SE, MUMmer, Clustal W, ACT
Databases InParanoid
Applications Genome annotation, WGS analysis, Nucleotide sequence alignment
Organisms Streptococcus parasanguinis
Diseases Endocarditis