Computational protocol: Comparative genomic analysis of novel Acinetobacter symbionts: A combined systems biology and genomics approach

Similar protocols

Protocol publication

[…] The isolated Acinetobacter strains were grown at 26 °C on both Mac-Conkey agar and blood agar plates until mid-log phase with shaking at 250 rpm. Whole genomic DNA extraction was performed according to the manufacturer’s instructions using the Promega Wizard Genomic DNA purification kit (Promega, Madison, WI). The concentration of DNA was determined by picogreen assay. DNA was used to construct TruSeq DNA libraries with manufacturer’s defaults, which were then sequenced on an Illumina HiSeq2000 platform with 100 base paired-end sequencing. The FASTQ paired-end reads were assembled using Velvet de-novo assembler, coverage was typically 30x and assembled genome size approximately 3 Mb. Genome assemblies were validated for the misassembled and low coverage regions using BWA and Tablet software packages. Quality filtered contigs were further extended using paired-end criterion. [...] Final assemblies were checked for the percentage completeness using with 31 protein encoding phylogenetic marker genes, and 107 single copy marker genes. Each genome revealed presence of all 31/31 and 107/107 genes, which suggests completeness. Open reading frames (ORFs) were called for each genome using FragGeneScan v1.16. Predicted ORF’s were annotated by KAAS (KEGG Automatic Annotation Server) to assign KEGG orthologs (KO) identifiers to the query ORFs sequences using GHOSTXx algorithm against KEGG GENES database. For automatic genome annotations, the Acinetobacter spp. SFA, SFB, SFC and HA genome assemblies were submitted to Rapid Annotation using Subsystems Technology (RAST) Server. Annotated genomes are accessible from the RAST server by logging in with the guest account with the accession numbers (RAST-ID) 258824, 258827, 258830 & 262612 for SFA, SFB SFC and HA respectively. Assembled genomes were phylogenetically delineated using two way ANI script in PYANI Master Pipeline using percentage identity algorithm at default parameter. Reference genomes were adopted from the list of all representative species of Acinetobacter maintained at the Broad Institute on 14/10/2015. [...] Modules of large PPI network are defined as the set of statistics and functionally significant interacting genes. MCODE, the plug-in of Cytoscape, identifies the clusters that are highly interconnected regions in a network. We used default setting of MCODE, which analyzed networks, using Scoring [include loops, degree cutoff (2)] and Finding [node score cutoff (0.2), haircut, node density cutoff (0.1), K-core (2), Maximum Depth (100)] parameters that were optimized to produce the best results for the network. The potential clusters were identified by a search method, estimating their significance scores with a high score (>1) and a decent number of nodes and edges. The extracted clusters were ranked by scoring through density and size. Once the nodes in a cluster were identified, one could intuitively reduce the complexity of the network by replacing the individual nodes with one large parent node, which allowed focusing on the interactions with the cluster. To understand the functional role of proteins involved in top three modules of each strain, we subjected the module proteins for GO annotation. Because modules tend to have a similar function, we over-represented the Gene Ontology categories (Molecular function, Biological process, Cellular Components) for modules in each strains network. The major categories were considered based on the percentage of each set of nodes to construct pie diagrams that allowed better visualization of the functional categories. [...] In biological networks, these motifs are suggested to be recurring circuit elements that carry out key information processing tasks. To understand these complex networks, we sought to break down such networks into basic building blocks. A network motif was defined based on the criterion that the number of occurrences must be at least five, and also must be significantly higher than that used in randomized networks. We applied FANMOD on the complete network, to select network motifs. The significance test was carried out on 1000 randomized networks, and a pattern with P < 0.05 was considered statistically significant. Clusters were analyzed for three node motif, using MCODE, from which we identified the motif within highly clustered nodes. […]

Pipeline specifications

Software tools MCODE, FANMOD
Application Protein interaction analysis
Organisms Ovis aries
Diseases Acinetobacter Infections