Flywheel Lab

One of the key initiatives of the Brain Science Center was to implement Flywheel, which is a proprietary neuroimaging informatics platform originating at Stanford University. Flywheel is a web-based service that stores raw and derived imaging data in encrypted cloud-based storage (automatically moving data between lower-cost archival tiers and more accessible live tiers of storage), and organizes data by research group, project, subject, imaging session and acquisition. Flywheel provides web-based tools, command-line tools, and a Python API for searching imaging data, organizing subsets of data into collections, downloading and uploading data. Crucially, Flywheel provides a pipeline mechanism, called “Gears” that allow common image processing streams, as well as custom pipelines, to be executed from the web-based interface using cloud-based computing resources. Several Flywheel gears are already available (eg FMRI PREP, HCP), and additional customized gears are under development. Data provenance tracking is built into Flywheel, hence for each derived image stored in the system, one can reconstruct and reproduce the complete sequence of commands that was used to generate it. This capability greatly enhances experimental rigor in neuroimaging research.

The rollout of Flywheel to the Penn neuroimaging community has been rocky. While there are enthusiastic early adopters, we recognize that many labs have concerns regarding cost, implementation, and data anonymization. These concerns were crystalized for many when we reached out for budget numbers to associate with Flywheel activity.

 

Please click on the tabs to see more information.

 

Geoffrey Aguirre

November 19, 2019

SUMMARY:

*All data currently on Flywheel, and all data that is reaped to Flywheel, has been and will be stripped of all “direct” personal identifying information.

*You can interact with Flywheel for each of your projects at one of three levels:

-None: If you provide no information for the Flywheel reaper at the scanner, your data will nonetheless be backed-up and stored for three months. You can also actively opt-out from this automatic backup. Cost: $0.

-Storage and Utility: Automatically store, label, and organize your MRI data in a cloud-based, backed-up system with web and SDK interfaces. Unlimited “utility” gears for file conversion and quality-assurance. Cost: $35 / Terabyte / month.

-Pipelines: Make use of analysis pipelines (fmriprep, XCP, HCP) for stored data. Cost: A one-time, $25 charge for each new session of reaped MRI data, and CPU charges dependent upon gear run-time (estimates: fmriprep $8, xcp $2, hcp-struct/func/ica $10 total; typical charges below).

 

BENEFITS:

*Data are automatically pulled from the scanner and securely stored, preventing data loss in the event of a scanner crash, hard-drive failure, or human error.

*MRI data in Flywheel are automatically de-identified, securely stored with redundant backup, are access and provenance tracked, and easily accessed by Web interface, command line tools, or an SDK implemented in Python or MATLAB.

*At $35/TB/month, the cost of data storage in Flywheel is less than other systems. (A 1-hour, 3T multiband fMRI session, raw data + all associated analysis files = approx 50 GB).

*The web-based interface allow you to visually inspect your data, as well as the output of automatic, free, quality-assurance tools.

*Stored data can be analyzed using a growing set of continuously updated, version tracked, open-source pre-processing tools. The total (optional) cost of processing a session of fMRI data is on the order of $35-50. This price will drop as we leverage the collective CPU purchasing power of the Penn Neuroimaging community.

*At any time you can download an organized repository of all of your data from Flywheel with the push of a button, including all file derivatives and analyses. Because all gears on Flywheel are implemented as open-source Docker containers, you can transfer the entire operation out of flywheel and onto a computing platform of your choice.

*These components substantially address the “rigor and reproducibility” requirement of NIH grants.

  • Data that are reaped from our scanners pass through a de-identification “profile”. We have worked with users who are collecting data from clinical populations to ensure that all personal identifying information is stripped.
  • Facial appearance from structural scans and the date & time of the session are regarded as “indirect” identifiers and do not need to be removed. If you wish, you can apply the (free) “deface” utility gear to your data within Flywheel.
  • The University of Pennsylvania has a Business Associate Agreement with Flywheel, meaning that storing data on Flywheel is functionally equivalent to storing de-identified data in your own lab on a password protected server.
  • No IRB modification is needed to use Flywheel.

REAPING AND UNKNOWN DATA:

*Automatic data reaping is taking place on SC3T and SC7T, and will start from HUP6 on December 1, 2019.

*Laminated instruction cards are present at every scanner with reaping instructions. A PDF copy of this instruction card is included with this email.

*At the time of data collection, enter the “reaper string” into the Study Comments field to direct the data to your Flywheel project.

*If you wish to prevent data from your session from going to Flywheel, follow the “opt-out” instructions on the card. Note that there will be a charge ($100) to have an MRI tech produce a DVD that contains your data.

*If you are conducting a tech-dev study, follow the “tech-dev” instructions on the card. Tech-dev data will be deleted from Flywheel after 3 months.

*If you enter an incorrect reaper string (or no string), your data may be sent to the “unknown” bin. Contact Gaylord Holder to transfer these data to a project. Unknown data will be deleted after 3 months.

PIPELINES:

*For each MRI session that is reaped to Flywheel and is subsequently processed using the analysis pipeline gears there is a one-time $25 charge. This charge is not applied to Penn archival data that are uploaded to Flywheel using the command line utility.

There is a charge associated with the CPU cycles that are used by the analysis gears to process data. Typical costs are:

-fmriprep: $8.00

-xcpengine: $1.75

-qsiprep: $6.50

-hcp-struct: $2.75

-hcp-func: $1.25 (cost per acquisition)

-hcp-icafix: $6.00 (cost to run five acquisitions together)

*CPU time is no more than $0.07 / hour / CPU (by default, Flywheel gears run on a virtual machine with two CPUS). By leveraging the total purchasing power of all users, the NNC can provide users with lower prices for this processing, and we expect this price to soon drop substantially (25%) with educational discounts, bulk purchasing, and the use of off-peak time on the CUBIC cluster.

 

BILLING:

*Each project on Flywheel must be associated with a Fund Number or PO; the cost of a project may be split across multiple Fund Numbers or POs if necessary.

*Charges will be assessed monthly.

*Please work with Margaret Ryan to provide billing information.

*For future grant applications, if you wish to make use of the Flywheel analysis pipelines, we recommend that you add $50-100 to the budgeted price of each scanning session to account for data storage, the one-time $25 pipeline fee, and the cost of CPU cycles.

*We are using the Google Cloud Platform (GCP) as the computational “engine” for executing analysis gears. GCP offers credits of $5K to $25K for investigators, post-docs, and PhD graduate students. Details to follow on how you can apply for and use these credits.

PEOPLE:

Information regarding UPenn people who can provide information regarding MRI and Flywheel:

*Gaylord Holder can assist with Flywheel account and project creation, uploading of archival data to Flywheel, and moving reaped data to the correct project. HolderG@pennmedicine.onmicrosoft.com

*Margaret Ryan oversees the association of Fund Numbers with particular projects, and can assist with billing issues. ryanm@pennmedicine.upenn.edu

*Ozzy Taskin knows how to create Flywheel analysis gears from MATLAB and Python code. taskin@pennmedicine.upenn.edu

*Karthik Prabhakaran can advise on the design of MRI scanning protocols that lend themselves to BIDS formatting. karthikp@pennmedicine.upenn.edu

*William Tackett is helping with the creation of fw-heudiconv heuristics related to the standardized MRI scanning protocols, and is familiar with the Flywheel Python SDK Tackett@pennmedicine.upenn.edu

*Matthew Cieslak, Tinashe Tapera, and Azeez Adebimpe are members of the Satterthwait Lab and BBL, and have been active in the development of Flywheel tools to place data in BIDS format (fw-heudiconv) and pipeline gears for structural and functional MRI (xcp, qsiprep). Cieslak@pennmedicine.upenn.edu, Tinashe.Tapera@Pennmedicine.upenn.edu, Azeez.Adebimpe@pennmedicine.upenn.edu

OTHER QUESTIONS AND ISSUES:

 

“Flywheel can do data pre-processing, but what then? I want to run a statistical analysis in SPM or FSL.”

  • While the pre-processing stage of data analysis is common to many projects, most experiments implement particular, idiosyncratic processing past this point. A typical workflow is to pre-process MRI data in Flywheel, and then download the analysis products to a local machine for statistical analysis. This download can be done via the web-interface, or scripted using the command line tools or SDK. If you like, you can upload the products of this local analysis back to Flywheel for storage and documentation. Additionally, there are gears to perform statistical analysis on fMRI data within flywheel (e.g., forwardModel, which implements non-linear, parameterized models).

 

“I used the CfN cluster to process data. I hear that’s going away. Where can I download my data from Flywheel and process it for this last stage?”

  • Most analyses of this kind can be run on a laptop / desktop computer in a reasonable time period. If you need help configuring a machine to make use of SPM / FSL, reach out for advice on the UPenn MRI slack channel.
  • If you need more computing power, you can make use of one of many virtual machine services for data processing. The experience is just like logging into the CfN cluster, with the caveat that your analysis script would also first download your data or analysis products from Flywheel to the virtual machine.
  • There are services that provide virtual machines that are pre-configured with software, such as Matlab (with a functioning license), e.g. https://console.cloud.google.com/marketplace/details/techila-public/techila?pli=1
  • Gaylord Holder can assist you in creating a Google Cloud Platform account and configuring a virtual machine instance with the software analysis tools your lab uses.
  • If you need even more computing power, the Penn CUBIC server provides highly cost-competitive CPU time. https://www.med.upenn.edu/cbica/cubic.html

 

“I have code that I use to process my MRI data. Can I turn that into a gear and run it on Flywheel?”

  • Yes! There is local expertise at UPenn on gear creation, and resources at Flywheel to assist with this. Once in a gear, your analysis tool will be self-documenting, version controlled, will execute using inexpensive CPU cycles, and is shareable (if you wish) with other investigators at Penn and across the neuroimaging community.

 

“I have a pile of old MRI data. What am I supposed to do with that?”

  • If you like, you can upload archival Penn DICOM data to Flywheel and re-process those data within Flywheel. There is a charge only for data storage and CPU usage, and the one-time $25 fee per session is waived. Contact Gaylord Holder for assistance.
  • If you have existing analyses created with tools outside of Flywheel, you can upload these materials as well for storage, although they will not have the version and provenance tracking that is present for built-in Flywheel tools.
  • The NNC is working on a storage solution for existing CfN cluster users that transfers existing data to a cloud-based storage system (not Flywheel). Further information on that effort to follow from John Detre.

 

“What happens if Flywheel jacks up their prices, or turns evil, or if I run out of money for a project and can’t afford to store those data anymore?”

  • You can always download all of your data and analyses in an organized format with a single button press. These data can continue to be processed with all the same open-source pipeline gears outside of Flywheel.

RESOURCES:

*The Flywheel.io website has articles, knowledge base materials, and documentation: https://docs.flywheel.io/hc/en-us

*Details on the Flywheel SDK: https://flywheel-io.github.io/core/

*For support related to Flywheel, you can email support@flywheel.io

*You can also contact one of the local Penn people listed in the next section.

*The Penn BBL has created Flywheel tools for applying BIDS organization to data, and a set of BIDS-compatible tools for processing MRI data: https://github.com/PennBBL

*The UPenn MRI slack has several channels devoted to Flywheel: https://upennmri.slack.com

*In January of 2020 we will resume monthly Flywheel tech meetings to share updates and ideas related to MRI data analysis. Planned topics include:

*Implementing analyses across subjects

*Creating a gear out of MATLAB code

The Brain Imaging Data Structure (BIDS; https://bids.neuroimaging.io) is a way of organizing imaging and behavioral data into standard file formats and folder structures. BIDS facilitates workflow consistency within labs, collaboration, and data sharing publicly. Additionally, there is a growing library of state-of-the-art processing pipelines which operates upon BIDS datasets.

 

Placing your data in BIDS format is greatly simplified if you follow standard conventions for naming the acquisitions of your scan protocol.

 

The SC3T scanner now has a template protocol (USER\Research\Flywheel\Generic) that automatically organizes acquired data into shareable, version-controlled BIDS datasets. Additional templates are under development that implement popular protocols such as the HCP Lifespan and ABCD sequences.

 

If you are starting a new neuroimaging study, please contact Karthik Prabhakaran (karthikp@pennmedicine.upenn.edu) for assistance with setting up protocols that are compatible with BIDS and the “ReproIn” heuristic.

 

EXAMPLE:

The example below is intended for a study collecting a T1 anatomical, T2 anatomical, single run for ‘no1back’ functional sequence, single run for ‘resting’ functional sequence then a diffusion sequence, and a field map. Those sequences could be copied from prior/other studies which might have already followed the naming convention and otherwise have desired settings.

 

Template Protocol:

anatscout

anatT1w_acq-MPRAGE

anatT2w_acq-SPACE
func_task-no1back_run-01

func_task-resting_acq-mb6_run-01

dwi_acq-mb2p2dir64

fmap_acq-2.5mm

<seqtype[-label]>[_ses-<SESID>][_task-<TASKID>][_acq-<ACQLABEL>]

[_run<RUNID>][_dir-<DIR>][<more BIDS>][__<custom>]

 

For more information on ReproIn see:

 

https://www.frontiersin.org/articles/10.3389/fninf.2019.00001/full