Getting Started and What to Know About Costs

  • Updated

Paying for your research

 

Creating an All of Us Researcher Workbench account is free. Additionally, the All of Us Research Program provides $300 initial credits to each registered researcher that can be applied towards cloud computational costs. Once these initial credits are exhausted or expire, a Google Cloud Platform (GCP) billing account must be set up to proceed with analyses on the Workbench.

Computational costs breaks down as follows:

Workspace Data + (Compute Applications and Customizations * Time) + (Storage * Time)

WHAT ARE YOU PAYING FOR?
DATA

Data are stored in Google BigQuery and extracted for analysis based on the criteria used to define your cohorts and datasets.

The larger the data and the more of it you extract, the more your costs will increase. You can save on costs by being intentional about selecting the data you plan to extract. 

Note: Minute-level Fitbit and genomic data are exceptionally large.

APPLICATIONS

Applications are loaded into virtual machines supported by Google Cloud Engine (GCE). Applications incur per hour costs when running or paused.

For customizable applications, costs will vary based on the settings you select for things like CPUs, RAM, etc.

STORAGE Data and project files can be stored to the workspace storage bucket (recommended) or persistent disks. Costs vary by the size and amount of data stored. Please note, GCP has enabled a "soft-delete" feature that is on by default for Google Cloud Storage (GCS), i.e. workspace bucket. To learn more about pricing impact, please see Google documentation here
WORKSPACE

All costs are incurred at the workspace level and billed through the workspace creator's account.

Costs include ALL computational spend associated with a workspace including spend incurred by a collaborator with owner or writer access.

Analysis costs by tools and features

You have a choice in which application to use for data analysis. Computational resource costs vary by application. In the Researcher Workbench, each application can be customized in the 'App' tab of the workspace. Apps offer a variety of virtual machines and computational profiles (RAM, CPU, GPUs) that vary in cost based on resource use. There is a cost calculator for each environment type available in the Researcher Workbench. Below is an example of using the standard Jupyter Lab app within the workbench, and potential associated cost. 

APPLICATIONS
APP FEATURES JUPYTER LAB
Default virtual
machine setting
n2-standard-4 (4 CPU, 2 core, 16 GB memory)
Cost when running $0.22 per hour
Cost when paused < $0.01 per hour

* All cost are based on Google Cloud Platform (GCP) and resources. Google Cloud cost estimations are internally computed directly from the pricing SKUs listed from the Cloud Billing API. This means that calculated costs reflect the on-demand pricing of cloud resources and do not factor in billing account-specific discounts including committed use discounts and negotiated pricing contracts. Learn more about cost in the Researcher Workbench here

Additionally, use of workflow tools such as Cromwell, Nextflow or dsub also incur cost on top of the analysis environment setting. To learn more see the 'How to use workflow tools' section of the Featured Workspace "All of Us Tutorial Workspace: Getting Started with Controlled Tier Data (v8)."

Read more about supported analysis applications and data storage options in the Researcher Workbench. 

To help assess total estimated costs for your project, you’ll want to consider the volume and size of the data, the application and virtual machine settings, the amount and size of files you plan to store, and the total amount of time you anticipate needing to run analysis and storing data. 

Estimated costs for featured workspaces

In the next table, we provide some example costs for a selection of workspaces stored in the Researcher Workbench featured workspace collection.

FEATURED WORKSPACE COST BREAKDOWNS
WORKSPACE TITLE DATA SIZE AND VOLUME VIRTUAL MACHINE SETTINGS TOTAL COST AND TIME
All of Us Tutorial Workspace - How to get started with Hail and VDS;

Controlled Tier genomic data (CDRv8)

Filter the whole VDS to 445 variants and 414,830 samples and write to various file formats (VCF, BGEN files, PLINK BED, Hail MatrixTable)

Jupyter Dataproc cluster

Environment setting 1 - Main node: n2-standard-16, Workers (2/100): n2-standard-8

Environment setting 2 - Main node: n2-standard-8, Workers (2/20): n2-standard-8

Environment 1: $25.07/h, 18 mins of analysis

Environment 2: $6.21/h, ~30min

Next articles

Using All of Us Initial Credits

Understand how to use your $300 initial credits provided by the All of Us Research Program.

Paying for Researcher Workbench Costs

Explore options available for paying Researcher Workbench costs.

Was this article helpful?

29 out of 37 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.