Similar protocols

Protocol publication

[…] We envision a ‘publication’ with four supplementary files, the: 1) data file, 2) workflow file, 3) execution environment specification, and 4) results. The task the author would like to enable, for an interested reader, will be to facilitate the use of the first three specifications and easily be able to run them, and confirm (or deny) the similarity of the results from an independent re-execution compared to those published.For the purpose of this report, we wanted an easy to execute query run on completely open, publically available data. We also wanted to use a relatively simple workflow that could be run in a standard computational environment and have it operate on a tractable number of subjects. We selected a workflow and sample size such that the overall processing could be accomplished in a few hours. The complete workflow and results can be found in the Github repository (doi, 10.5281/zenodo.800758; ). The data. The dataset for this exercise was created by a query as an unregistered guest user of the NITRC Image Repository (NITRC-IR; RRID:SCR_004162; ). We queried the NITRC-IR search page ( http://www.nitrc.org/ir/app/template/Index.vm; 1-Jan-2017) on the ‘MR’ tab with the following specification: age, 10–15 years old; Field Strength, 3. This query returned 24 subjects, which included subject identifier, age, handedness, gender, acquisition site, and field strength. We then selected the ‘mprage_anonymized’ scan type and ‘NIfTI’ file format in order to access the URLs (uniform resource locators) for the T1-weighted structural image data of these 24 subjects. The subjects had the following characteristics: age=13.5 +/- 1.4 years; 16 males, 8 females; 8 right handed, 1 left and 15 unknown. All of these datasets were from the 1000 Functional Connectomes project ( ), and included 9 subjects from the Ann Arbor sub-cohort, and 15 from the New York sub-cohort. We captured this data in tabular form ( ). Following the recommendations of the Joint Declaration of Data Citation Principles ( ), we used the Image Attribution Framework ( ) to create a unique identifier for this data collection (image collection: doi, 10.18116/C6C592; ). Data collection identifiers are useful in order to track and attribute future reuse of the dataset and maintain the credit and attribution connection to the constituent images of the collection which may, in general, come from heterogeneous sources. Representative images from this collection are shown in . The workflow. For this example, we use a simple workflow designed to generate subcortical structural volumes. We used the following tools from the FMRIB software library, version 5.0.9 (FSL, RRID:SCR_002823; ), conformation of the data to FSL standard space (fslreorient2std), brain extraction (BET), tissue classification (FAST), and subcortical segmentation (FIRST).This workflow is represented in Nipype (RRID:SCR_002502; ) to facilitate workflow execution and provenance tracking. The workflow is available in the GitHub repository. The workflow also includes an initial step that accesses the contents of , which are pulled from a Googles Docs spreadsheet ( https://docs.google.com/spreadsheets/d/11an55u9t2TAf0EV2pHN0vOd8Ww2Gie-tHp9xGULh_dA/edit?usp=sharing) to copy the specific data files to the system, and a step that extracts the volumes (in terms of number of voxels and absolute volume) of the resultant structures. The code for these additional steps is included in the GitHub repository as well. In this workflow, the following regions are assessed: brain and background (as determined from the masks generated by BET, the brain extraction tool), gray matter, white matter and CSF (from the output of FAST), and left and right accumbens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus-proper (from the output of FIRST) See for the workflow diagram. The execution environment. In order to utilize a computational environment that is, in principle, accessible to the other users in configuration identical to the one used to carry out this analysis, we created a Docker ( https://www.docker.com/) container to encapsulate the specific computation environment and analysis pipeline components: https://hub.docker.com/r/repronim/simple_workflow/tags/. A Docker container permits efficient environment and software delivery for easy deployment on most common operating systems (Linux, Windows, Mac). The build instructions for the Docker container are provided on the GitHub repository, and uses Debian 8.7 as the base operating system. Setting up the software environment on a different machine. In addition to a Docker container, one can re-execute the workflow on a different machine or cluster than the one used originally. General instructions for setting up the needed software environment on GNU/Linux and MacOS systems is provided in the README.md file in the GitHub repository. We assume FSL is installed and accessible on the command line (FSL can be found at https://fsl.fmrib.ox.ac.uk). In order to establish a precise overall environment, we use Conda ( https://conda.io/), a cross-platform package manager and handles user installations for many packages into a controlled environment. Unlike many operating system package managers (e.g., yum, apt), Conda does not require root privileges. This allows individuals to replicate isolated virtual environments easily without requiring system administrator help. Conda uses standard PATH variables to isolate the environments. Coupled with Anaconda cloud and conda-forge, Conda is capable of installing versioned dependencies of Python and other packages. In this way, a Python 2.7.12 environment can be set up and the Nipype workflow re-executed with a few shell commands, as noted in the README.md.One can also use the NITRC Computational Environment (NITRC-CE, RRID:SCR_002171). The NITRC-CE is built upon NeuroDebian (RRID:SCR_004401; ), and comes with FSL (version 5.0.9-3~nd14.04+1) pre-installed on an Ubuntu 14.04 operating system. We run the computational environment on the Amazon Web Services (AWS) elastic cloud computing (EC2) environment. With EC2, the user can select properties of their virtual machine (number of cores, memory, etc.) in order to scale the power of the system to their specific needs. For this paper, we used the NITRC-CE v0.42, with the following specific identifier (AMI ID): ami-ce11f2ae. […]

Pipeline specifications