Computational protocol: A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research

Similar protocols

Protocol publication

[…] Potentially relevant datasets deposited in GEO were identified using an advanced query based on the Bioconductor package GEOmetadb and the SQLite database that captures detailed information on the GEO data structure; . The search query was designed to retrieve entries where the title and description contained the word Monocyte OR Monocytes, were generated from human samples, using Illumina or Affymetrix commercial platforms. The query result is appended with rich metadata from GEOmetadb that allows for manual filtering of the retrieved collection.The relevance of each entry returned by this query was assessed individually. This process involved reading through the descriptions and examining the list of available samples and their annotations. Sometimes it was also necessary to review the original published report in which the design of the study and generation of the dataset is described in more detail. Using the search query, the results also returned a number of datasets that did not include profiles of monocytes but instead of “monocyte-derived dendritic cells” or “monocyte-derived macrophages”. During our manual screen these were excluded as were studies employing monocytic cell lines. Only studies including primary human monocyte profiles were retained. The datasets cover a broad range of studies investigating human monocyte immunobiology in the context of diseases and through comparison with diverse cell populations and study types as illustrated by a graphical representation of relative occurrences of terms in the descriptions of the studies loaded into our tool ( ). A wide range of cell types and diseases are represented. Ultimately, the collection was comprised of 93 curated datasets. It includes datasets generated from studies profiling primary human CD14+ cells isolated from patients with autoimmune diseases (7), bacterial, virus and parasite infections (7), cancer (4), cardiovascular diseases (4), kidney diseases (4), as well as monocytes isolated from healthy subjects (58) ( ). The 58 datasets in which monocytes were isolated from healthy subjects were classified based on whether profiling was conducted ex vivo or following in vitro experiments. In total 38 datasets were identified in which primary human CD14+ cells were stimulated or infected in in vitro experiments ( ). Among the many noteworthy datasets, there are 8 datasets investigating differences between monocytes subsets; classical (CD14++CD16-), intermediate (CD14+CD16+) and non-classical monocytes (CD14-CD16++) – [GXB: GSE16836, GSE18565, GSE25913, GSE34515, GSE35457, GSE51997, GSE60601, GSE66936]. Another dataset from Banchereau and colleagues investigated responses of monocyte and dendritic cells to 13 different vaccines in vitro [GXB: GSE44721]. The datasets that comprise our collection are listed in and can be browsed interactively in GXB. [...] Once a final selection was made each dataset was downloaded from GEO in the SOFT file format. It was in turn uploaded on an instance of the Gene Expression Browser (GXB) hosted on the Amazon Web Services cloud. Available sample and study information were also uploaded. Samples were grouped according to possible interpretations of study results and ranking based on the different group comparisons that were computed (e.g. comparing monocyte isolated from case vs controls in studies where profiling was performed ex-vivo; or stimulated vs medium control in in vitro experiments). [...] Quality control checks were performed with the examination of profiles of relevant biological indicators. Known leukocyte markers were used, such as CD14, which is expressed by monocytes and macrophages; as well as markers that would indicate significant contamination of the sample by other leukocyte populations: such as CD3, a T-cells marker; CD19, a B-cell marker; CD56, an NK cell marker ( ; The expression of the CD14 marker across all studies can be checked using the cross project functionality of GXB: We have systematically verified that expression of the genes encoding those surface markers was consistent with grouping labels provided by depositors. In addition, expression of the XIST transcripts, in which expression is gender-specific, was also examined to determine its concordance with demographic information provided with the GEO submission (expression of XIST should be high in females and low in males). […]

Pipeline specifications

Software tools GEOmetadb, GXB, AWS, GeneBrowser, GeneID
Applications Miscellaneous, Transcription analysis
Organisms Homo sapiens