Virtualize your computing with IaaS
Since the advent of high-throughput sequencing, the number and complexity of biological data has exploded. Though information technology and computers have paralleled this development, researchers rarely meet the computational power and storage capability required to perform the analysis of their data (Ben Langmead and Abhinav Nellore).
Infrastructure as a service to the rescue
For this, infrastructures as a service (IaaS) have been created to let users run large-scale computing workloads on the cloud – that is, on virtual machines hosted on dedicated infrastructures – and to stock up to exabits of data.
According to the National Institute of Standards and Technology, IaaS can be described as:
The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).
Example of services provided by IaaS and cloud computing:
- Website hosting
- Virtual computing hardware through cloud hosting
- Stockage, backup and recovery
- Web application
- Virtual desktops
- Application hosting
Some popular IaaS
- Amazon Web Services: AWS is the leading cloud computing service and offers a wide range of cloud solutions, such as computing, storage, networking, database, analytics, application services, deployment, management, mobile, developer tools, and tools for the Internet of Things. AWS proposes “Genomics in the cloud”, a service to store and compute biology data.
- Joyent: Joyent has developed Triton, a cloud that revolves around three main services, Triton Compute (for virtual machines, containers and bare metal), Triton Object Storage (supports exabytes-scale workloads) and Triton Converged Analytics (serverless computing and big data).
- Google Compute Engine: Compute Engine is an infrastructure as a service that lets you run your large-scale computing workloads on Linux virtual machines hosted on Google’s infrastructure.
IaaS in genomics
The Sequence Read Archive (hosted by the NCBI) now holds about 14 petabases (Figure 1). These sequences are archived to perform various analysis and re-analysis, such as meta-analyses or clinical predictor evaluation. The computational steps required to extract information from these data (read alignment, de novo assembly, variant calling and quantitication) require an increasing computational power of machines, that the average scientist does not have access to.
The advantages of using cloud computing and IaaS for genomics are multiple: The cloud is globally accessible, which means that all collaborators of a given project can share their data and run the same analysis using the same tools on the same site.
Cloud computing also fosters reproducibility by enabling investigators to publish data sets to the cloud. Large collection of archived data can be re-analyzed with much less time and effort.
Cloud computing and IaaS make the “big data” challenge easier for biologists, and their use will very likely increase in the next years. In the meantime, researcher will have to familiarize themselves with this concept and learn how to maximize their use.
Peter Mell and Timothy Grance (2011). The NIST Definition of Cloud Computing (Technical report). National Institute of Standards and Technology: U.S. Department of Commerce.
Ben Langmead and Abhinav NelloreCloud. (2018). Cloud computing for genomic data analysis and collaboration. Nature Reviews Genetics.