Computational protocol: Pangenome analysis of Bifidobacterium longum and site-directed mutagenesis through by-pass of restriction-modification systems

Similar protocols

Protocol publication

[…] Prediction of putative open reading frames (ORFs) was performed using PRODIGAL prediction software (http://prodigal.ornl.gov/) and supported by BLASTX [] alignments. Results of Prodigal/BLASTX were combined manually and a preliminary identification of ORFs was performed on the basis of BLASTP [] analysis against a non-redundant protein database provided by the National Centre for Biotechnology (http://www.ncbi.nlm.nih.gov/). Using the ORF finding outputs and associated BLASTP results, Artemis [] was employed for visualisation and manual editing in order to verify, and, where necessary, redefine the start of every predicted coding region, or to remove or add coding regions. The assignment of protein function to predicted coding regions was performed manually. In addition, the individual members of the revised gene/protein data set were searched against the protein family (Pfam) [] and COG [] databases. Ribosomal RNA (rRNA) and transfer RNA (tRNA) genes were detected using RNAMMER (http://www.cbs.dtu.dk/services/RNAmmer/) and tRNA-scanSE (http://lowelab.ucsc.edu/tRNAscan-SE/), respectively. COG category assignment [] was performed by means of BLASTP [] analysis against the COG database [] for deduced proteins of all identified ORFs contained by the genomes of both B. longum strains that were sequenced as part of the current study, and of all publicly available B. longum strains.The genome sequences of both B. longum subs. longum strains were searched for the presence of Restriction/Modification systems using a BLASTP [] alignment function of the REBASE database (http://rebase.neb.com/rebase/rebase.html) (cut-off e-value of 0.00001; and showing at least 30 % similarity across 80 % of the protein length). [...] The computation of a phylogenetic supertree was performed based on the alignment of a set of orthologous proteins defined by the pan-genome computation. Each protein family was aligned using CLUSTAL_W v1.83 []. Phylogenetic trees were computed using the maximum-likelihood in PhyML v3.0 [] and concatenated; the resulting consensus tree was computed using the Consense module from the Phylip package v3.69 using the majority rule method (http://evolution.genetics.washington.edu/phylip.html). Whole genome comparisons of the two newly sequenced B. longum strains were performed against B. longum subsp. longum NCC2705 (AE014295). Whole genomes were compared at the nucleotide level using MUMmer software [] at default settings. […]

Pipeline specifications

Software tools BLASTX, BLASTP, RNAmmer, tRNAscan-SE, Clustal W, PhyML, MUMmer
Databases Pfam
Applications Genome annotation, Phylogenetics, Nucleotide sequence alignment
Organisms Bifidobacterium longum, Homo sapiens, Escherichia coli