Computational protocol: Matrix Metalloproteinase-10 Promotes Kras-Mediated Bronchio-Alveolar Stem Cell Expansion and Lung Cancer Formation

Similar protocols

Protocol publication

[…] Three lung cancer gene expression datasets were analyzed to assess the relationships between Mmp10 levels, the cancer stem cell phenotype and metastasis in human cancer. The first two data sets (GSE11969 and GSE13213) are comprised of gene expression measurements from NSCLC tumors. , . The third dataset (GSE10799) contained expression values from human lung adenocarcinoma samples that had produced metastasis in bone tissue compared to samples that had not . All three of the microarray datasets were downloaded from GEO into the “R statistical computing language” using the “GEOquery” package of the “Bioconductor” software suite , , . Quantile normalization of the datasets was performed using the “preprocess core” module , . GSE11969 and GSE13213 were sorted according to their Mmp10 expression values. Lung tumor samples in GSE11969 were segregated into two sets. The first set contained the 30 samples with the highest Mmp10 expression values and the second set the 30 samples with the lowest Mmp10 expression values. GSE13213 was treated in the same manner except samples were separated into groups of 35 instead of 30. The size of the groups was determined to maximize the statistical significance of differential Mmp10 expression in each group as determined by a Welch's t-test.Gene Set Enrichment Analysis (GSEA) was carried out on all of the lung cancer gene expression datasets described above , . For each dataset, GSEA's were performed using two groups of gene sets that were available as part of the Molecular Signatures Database (MSig) Version 3.0 ( . The first collection of gene sets was intended to measure each datasets' degree of enrichment for the cancer stem cell phenotype. This group of gene sets was selected by searching the MSig database for signatures that contained the terms “cancer” and “stem” within their descriptions. The second collection of gene sets contained every signature listed in the MSig database and was intended to explore the relationships among the datasets in an untargeted fashion. In all GSEA's, gene sets that produced nominal p-values of less than 0.05 and false discovery rates (FDRs) of less than 0.25 were considered to be significantly enriched in the tested dataset. […]

Pipeline specifications

Software tools GEOquery, GSEA
Databases MSigDB
Application Transcription analysis
Organisms Mus musculus, Homo sapiens
Diseases Lung Neoplasms, Neoplasms
Chemicals Urethane