Computational protocol: Molecular Networks of Human Muscle Adaptation to Exercise and Age

Similar protocols

Protocol publication

[…] Low-level processing of all arrays was undertaken using Bioconductor in R. The Affy package was used to carry out MAS5 based normalization and generate present, marginal and absent (PA) scores. NUSE plots were generated and combined with PCA, outlier samples were identified where both the NUSE plot and PCA was supportive of its exclusion (∼2% of arrays). For baseline correlation analysis, all samples that passed QC were utilized (N = 45). This procedure was applied to the data-set originating from the Trappe laboratory (GSE28422, ) and outliers removed from the dataset that failed the QC process, leaving n = 96 for analysis. Pre-exercise training muscle biopsy samples from the HERITAGE family study (N = 50) were also analyzed to yield a second independent data set with a continuous span of age-ranges (see below). The Trappe and HERITAGE datasets therefore represent independent datasets which we utilized, where possible, to validate the pathway analysis of our study. Such confirmation benchmarks results using thousands of data-points and is more desirable that targeted real-time qPCR confirmation (where the gene selection is biased and the sample size inappropriate to make statistical conclusions). Annotation of all CEL files used ‘hgu133plus2cdf_2.9.1.tgz’ while annotation of probe-set lists was then updated using the Ingenuity Pathway Analysis database, as of August 2012.Our first objective was to identify the gene-networks regulated in proportion to gains in upper leg muscle mass (hypertrophy), the same location as our biopsy sample. Such analysis relies on the established principal that adaptation responses (for the majority of phenotypes) to exercise training in outbred populations is highly variable, typically reflecting genetic and epigenetic variation and in genomic variation. We utilized quantitative SAM analysis , to generate a list of genes which vary in a positive and negative manner with changes in DEXA assessed upper leg lean mass. This was applied to PA filtered data and the statistical parameter generated is a q-value (false discovery rate). This provided for the first time a candidate list of gene-changes that may exhibit primary or secondary influence over muscle growth in humans. The gene-list was then subject to IPA based pathway analysis and in particular the Upstream Analysis tool in IPA was utilized. This analysis has similarities to the Molecular Connectivity Database where pre-existing collections of RNA signatures are compared with our lean-mass related gene list, and significant overlaps identified. An overlap P-value is generated based on the degree of overlap between the gene-set within the IPA database (which reflects the RNA molecules changed in response to a ‘mediator’ such as a transcription factor or a drug) and our data set, adjusting for data set sizes using the Fischer's Exact Test. We accepted a stringent P-value of p<0.001 as being significant. A second parameter is the activation “z-score” where the directional change in RNA is compared between the IPA mediator data-set and our lean-mass gain gene list. The z-score informs on whether the drug/protein mediator is likely to be ‘active’ or ‘inhibited’ during gains in lean mass. Thus, if we discovered that an antagonist is ‘inhibited’ in our analysis, this indicates that the drug target is activated. However, in the present study the data-input refers to genes, which positively or negatively correlate with lean mass gains e.g. if we find a “Statin” signature was inhibited, it is interpreted that HMG-CoA reductase regulated genes are negatively correlated with lean mass gain.The two-step process presented above generates a focused gene-list with a high statistical rigor for true positive associations. This type of analysis also utilizes the full range of physiological response observed, however it assumes that expression of important genes will relate in a linear manner to lean mass gain and thus can not discover all appropriate associations. We then contextualize the statistical findings both in terms of subject characteristics and through comparison of the response of these significant networks with independent gene-array data (e.g. ). At this stage we utilized descriptive statistics, plotting the significant network genes as simple expression values relative to the quartile distribution of lean-mass gains to allow for clear discussion of the results. As these plots are based on the z-scores and P-values as above, no further statistical analysis is presented.Following identification of our primary objectives we then carried out a classic differential expression analysis using SAM. Given that we have established that chronic differential expression patterns, following exercise training, are dependent on the presence of physiological adaptation we removed 6 subjects that demonstrated no gain in lean mass. This yielded a list of differentially expressed genes that could then be compared with the RET gene-list generated from the Trappe laboratory data and our published exercise studies . Secondary analysis, where subject age or baseline lean-mass was related to baseline gene-expression was carried out using quantitative SAM analysis as described above , . This allowed us to present comparisons of the RET gene-list with other modes of exercise, such as endurance exercise training or disease , and age-related analysis , . PCA was utilized to visualize the association between selected physiological and protein expression parameters and training induced changes in muscle lean mass. PCA was implemented in R, using prcomp() command, which calculated a singular value decomposition and plots the selected principal components using the plot command in R. All data was individually transformed to a median value within that data set so that all variables were within a consistent data range. In each case the majority (∼65%) of the total variance was captured by the first two principal components.Finally, positional gene enrichment analysis (PGE) was used to identify whether the classification genes (or the classifier network genes) were significantly enriched within given chromosomal regions . This analysis is based on the following rules: Rule 1: it contains at least two genes of interest, Rule 2: there is no smaller region containing the same genes of interest, Rule 3: there is no bigger region with more genes of interest and the same genes not of interest, Rule 4: there is no larger encompassing region with a higher percentage of genes of interest, Rule 5: there is no smaller encompassed region with a better P-value, Rule 6: it does not contain any region having less than expected genes of interest. The approach of PGE exhaustively evaluates the over-representation at all chromosomal resolution levels simultaneously. […]

Pipeline specifications

Software tools affy, IPA
Application Gene expression microarray analysis
Organisms Homo sapiens
Chemicals Serotonin, Tretinoin, Sirolimus