Computational protocol: Using human brain activity to guide machine learning

Similar protocols

Protocol publication

[…] All of the fMRI data were scaled to bring the value of each dimension within the range of [0, 1] for RBF SVM training. For each voxel, we calculated the minimum and maximum response amplitude across all 1,260 original training samples. All voxels for the 646 images used in our experiments were then scaled using Equation , where xij is the j-th sample’s response amplitude for voxel i, x→i is a 646-dimensional vector with the response amplitudes of all samples for voxel i, and xij' is the j-th sample’s rescaled amplitude for voxel i.4xij'=xij−min(x→i)max(x→i)−min(x→i)The main challenge of generating weights from brain activity (i.e., activity weights) lies in reducing high-dimensional, nonlinear data to a salient, lower-dimensional signal of “learnability”. The supervised machine learning formulation used in this work requires a single real valued weight per training sample for a loss function (described below). Activity weights were computed by using a logistic transformation to calibrate the scores from SVMs with RBF kernels trained on brain activity. For each object category and for all voxels from a given combination of ROIs, we made use of all the positive samples for that object category as well as all the samples that are negative for all object categories; together, these are the aforementioned 646 samples (i.e., clear sample set). Only activity weights for this subset of a partition’s training set, as opposed to annotations for all 1,386 stimuli were generated. This constraint maximized the signal-to-noise ratio in the activity weights and improved the saliency of activity weights for a specific object category by only weighting clear positive and negative samples.Activity weights for training were generated using a modification of the k-fold cross validation technique. For a given training set that is a subset of the whole 1,386 set, the collection of voxel data for the training set’s images in the 646-stimuli clear sample set was randomly split into five folds. For each of these folds, we held out the current fold as test data and combined the other four folds as training data. With this newly formed training set, a grid search was performed to tune the soft margin penalty parameter C and RBF parameter γ for an RBF SVM classifier using the LibSVM package. Finally, activity weights were generated by testing the classifier on the held-out fold to produce Platt probability scores of class inclusion. This process was repeated for all five folds to generate activity weights for the collection of stimuli in the training set that are part of the clear sample set. [...] Each of the original 500 × 500 colored images were down sampled to 250 × 250 grayscale images, with pixel values in the interval [0,1]. A layer of Gaussian noise with a mean of 0 and variance of 0.01 was added to each of these images. For each image, two feature descriptor types were independently generated. Histogram of Oriented Gradients (HOG) descriptors with a cell size of 32 were generated using the VLFeat library’s vl_hog function, which computes UoCTTI HOG features. Convolutional neural network (CNN) features were generated using the Caffe library’s BLVC Reference CaffeNet model, which is AlexNet trained on ILSVRC 2012, with minor differences from the version described by Krizhevsky et al.. 1000-dimension pre-softmax activations from CaffeNet were used as the CNN image features. Four partitions of training and test data were created. In each partition, 80% of the data was randomly designated as training data and the remaining 20% was designated as test data.For each partition, experiments were conducted for the 127 ways that the seven higher-level visual cortical regions (i.e., EBA, FFA, LO, OFA, PPA, RSC, and TOS) could be combined. In each experiment, for a given combination of higher-level visual cortical regions and for a given object category, two training steps were followed:1. Activity weights were generated for a sampling of training stimuli, ones that are part of the 646-stimuli clear sample set, using an RBF-kernel SVM classifier trained on the training voxel data for that combination, following the fMRI activity weight calculation procedure described above.2. Five balanced classification problems were created from the given partition’s training data. For each balanced classification problem and each set of image descriptors (HOG and CNN features), two SVM classifiers were trained and tested–one that uses a standard hinge loss (HL) function and another that uses a activity weighted loss (AWL) function described by Equations  and . Both classifiers used an RBF-kernel.The hinge loss function in Equation  is solved via Sequential Minimal Optimization. It is not necessary to assign an activity weight cx∈C derived from fMRI data to every training sample; cx can be 0 to preserve the output of the original hinge loss function. In our experiments, cx∈[0,1], where cx corresponds to the probability that x is in the object category in question; this results in penalizing more aggressively the misclassification of strong positive samples. The libSVM package was used to train and test SVM classifiers using a hinge loss function. To train classifiers using an activity weighted loss function, we modified publicly available code for an alternative additive loss formulation.For each object category, combination of higher visual cortical regions, and set of image descriptors, we created five balanced classification problems. For each problem, we created a balanced training set with an equal number of positive and negative examples. For all object categories, because there were more negative than positive samples, all positive samples were used in each balanced problem and the same number of negative samples were randomly selected for each balanced problem. The balanced problems only balanced the training data; each balanced problem used the same test set: the partition’s held-out test set.For both loss functions, binary SVM classifiers with RBF kernels were trained without any parameter tuning, using parameters C = 1 and γ = 1/number of features. The activity weighted loss function incorporates the calibrated probability scores from the first stage voxel classifiers as activity weights. We assigned these activity weights to the training samples that are members of the 646-stimuli clear samples set. For samples without fMRI-derived activity weights, activity weights of 0.0 are used. Finally, classifiers were tested on the partition’s test set. In experiments using CNN features, RBF-kernel SVM classifiers converged during training, even though the vectors consisted of high-dimensional data. […]

Pipeline specifications

Software tools LIBSVM, Caffe
Applications Miscellaneous, Functional magnetic resonance imaging
Organisms Homo sapiens