Similar protocols

Protocol publication

[…] Principal Component Analysis (PCA) [] was employed to reduce the dimensionality of the dataset before the application of classification methods. Partial Least Square Discriminant Analysis (PLS-DA) [] was applied for classification purposes to obtain a first selection of the discriminating variables by using a binary coded Y variable (−1 for control samples and +1 for pathological samples). A preliminary application of PLS-DA reduced the number of relevant variables to 1612. Only classification models with a maximum of 6 Principal Component (PCs) were considered (Additional file ). Ranking-Principal Component Analysis (R-PCA) [] ranked variables according to their decreasing discriminant ability. Linear Discriminant Analysis (LDA) [], a Bayesian classification method, provided the classification of the samples considering the multivariate structure of the data. Here, a Forward Selection procedure [] was applied to the principal components. The classification performance of the models was evaluated by the non-error rate (NER%), namely the percentage of overall correct assignments. Further data processing was performed in the R computing environment (http://www.r-project.org/) version 2.8.0 with BioConductor packages (http://www.bioconductor.org/). Data were first filtered by eliminating probes with detection call of poor quality as well as those with intensity value lower than log2100 for all the samples. Of the original 60 samples, one (a control samples) did not pass the microarray hybridization quality controls and was excluded from further analyses. Therefore, the final dataset consisted of 59 samples described by 15137 probes.Robust Multi-Array Average (RMA) normalization was applied to microarray data and these were imported in the Multiexperiment Viewer (MeV) software version 4.5.1 for Windows XP (http://www.tm4.org/mev.html). Statistical analysis was performed with PUMA [], SAM (Significance Analysis of Microarrays) [] and Rank Product (RP) modules [] to detect significantly differentially expressed genes.PUMA is a Bayesian method (available in R BioConductor) that includes probe-level measurement error into the estimates of expression profile []. These were normalized through a median global array scaling, and a single expression value for each condition was combined from the replicates and associated to a probability of positive log ratio (PPLR) between conditions. In order to facilitate the interpretation of results, PPLR was converted in a p-value-like form: 1-PPLR was used for up-regulated genes while PPLR for down-regulated ones. SAM was chosen for its power to allow the control of false positive results (False Discovery Rate or FDR). This is particularly relevant when looking at human samples because of the inherent rate of genetic variation among individuals. Data were filtered so that only probe sets that had a Present call and intensity value of >100 in at least half the arrays of the smaller group were retained.Functional analyses were performed using Gene Ontology (GO) annotations [], DAVID Bioinformatics Resources [] and Gene Set Enrichment Analysis (GSEA) [] as implemented at http://www.broadinstitute.org/gsea/, version 2.06. […]

Pipeline specifications

Software tools TM4, SAM, DAVID, GSEA
Organisms Homo sapiens, Puma concolor
Diseases Disease, Parkinson Disease, Wiskott-Aldrich Syndrome, Common Variable Immunodeficiency, Neurodegenerative Diseases, Heredodegenerative Disorders, Nervous System, Mitochondrial Diseases