Computational protocol: A metagenomic study of diet-dependent interaction between gut microbiota and host in infants reveals differences in immune response

Similar protocols

Protocol publication

[…] Shotgun 454 read data were preprocessed in the following manner: (i) low quality reads were removed if the read mean Phred value was <20 and/or when two or more consecutive nucleotides exhibited a Phred value <20; (ii) reads were clustered using CD-HIT-454 [] at 100% removing duplicates; (iii) the remaining shotgun sequences were analyzed using the MG-RASTv2 pipeline and the phylogenetic distribution and metabolic functional composition of the samples were profiled []. Representative MG-RAST sample statistics are shown in Additional file . Taxonomic classifications were assessed in two ways. First, identifiable 16S fragments in the shotgun sequences were used to align to the GreenGenes small subunit rRNA database []. Second, PhymmBL [] was used as an additional metagenomic phylogenetic classification tool. PhymmBL uses BLAST and interpolated Markov models to taxonomically classify DNA sequences, including reads as short as 100 bp. In addition, to filter out possible human contamination from the reads, BLAST was used to compare all reads to the human genome (Genome Reference Consortium assembly, version 37, 2009 []). Any full length reads that were 100% identical to the human reference genome were discarded. Reads that were 100% identical, but whose length was under 80 amino acids or did not share a full-length alignment with the human reference sequence were not discarded. Between 0 and 13,222 reads were discarded from each sample. The percentage of discarded reads did not exceed 4.3% (13,222 discarded from sample 6) and in all other samples was lower than 0.4%.Microbiota functional characteristics in BF and FF infants were compared. Additional file provides a breakdown of the average number of reads across samples that were mapped to functional SEED categories. Because of the hierarchical structure of the SEED classification system, aggregating reads into coarser classifications provided for a more informed analysis. For SEED level 2 classifications, approximately 25% of the functional classifications exhibited an average number of 200 reads across samples. At SEED level 3, approximately 6% of the functional classifications had an average number of 200 reads. Subsequently, comparisons between functional categories were carried out subject to the following restrictions: SEED level 1 functional categories were compared if all the FF or BF samples had at least 200 reads from each sample and at least 2% of the overall number of reads of each sample; SEED level 2 functional categories were compared if all the FF or BF samples had at least 100 reads from each sample and at least 0.5% of the overall number of reads of each sample; SEED level 3 functional categories were compared if all the FF or BF samples had at least 50 reads from each sample and at least 0.1% of the overall number of reads of each sample. […]

Pipeline specifications

Software tools cd-hit-454, PhymmBL
Databases MG-RAST Greengenes GRC
Application Metagenomic sequencing analysis
Organisms Homo sapiens