Getting Started and What to Know About Costs

  • Updated

Paying for your research

 

Creating an All of Us Researcher Workbench account is free. Additionally, the All of Us Research Program provides $300 initial credits to each registered researcher that can be applied towards cloud computational costs. Once these initial credits have been exhausted, a Google Cloud Platform (GCP) billing account must be set up to proceed with analyses on the Workbench.

Computational costs breaks down as follows:

Workspace Data + (Compute Applications and Customizations * Time) + (Storage * Time)

WHAT ARE YOU PAYING FOR?
DATA

Data are stored in Google BigQuery and extracted for analysis based on the criteria used to define your cohorts and datasets.

The larger the data and the more of it you extract, the more your costs will increase. You can save on costs by being intentional about selecting the data you plan to extract. 

Note: Minute-level Fitbit and genomic data are exceptionally large.

APPLICATIONS

Applications are loaded into virtual machines supported by Google Cloud Engine (GCE). Applications incur per hour costs when running or paused.

For customizable applications, costs will vary based on the settings you select for things like CPUs, RAM, etc.

STORAGE Data and project files can be stored to the workspace storage bucket (recommended) or persistent disks. Costs vary by the size and amount of data stored.
WORKSPACE

All costs are incurred at the workspace level and billed through the workspace creator's account.

Costs include ALL computational spend associated with a workspace including spend incurred by a collaborator with owner or writer access.

Analysis costs by tools and features

You have a choice in which application to use for data analysis. Computational resource costs vary by application.

APPLICATIONS
APP FEATURES JUPYTER NOTEBOOK RSTUDIO SAS STUDIO
Default virtual
machine setting
4 CPUs, 15GB RAM* 4 CPUs, 15GB RAM** 4 CPUs, 15GB RAM**
Cost when running $0.20 per hour $0.40 per hour $0.40 per hour
Cost when paused < $0.01 per hour $0.21 per hour  
Persistent disk cost*** $4.80 per month $4.00 per month $10.00 per month
EXAMPLES OF OTHER COSTS
DATA EXTRACTION WORKSPACES DATA STORAGE CROMWELL

$0.02 per sample when using the genomic extraction tool to extract variant data.

Researchers who plan to use the genomics CRAM files (e.g., raw data) should not that these files are exceptionally large and are accompanied by egress charges incurred through bucket extraction.

Data otherwise extracted from the CDR are completed through SQL queries that are billed by the number of bytes read. The more data you extract, the more the query will cost.

$0.20 per workspace per month

Saving to the workspace bucket is the most cost efficient saving method, billed $0.026 per GB per month.

See "Applications" above for standard persistent disk costs. Costs can increase if PD is upgraded from standard to solid state-drive or GB storage increases.

$270 per month for one application when running. 

Each additional application will costs $158 per month when running.

When running, a Cromwell instance is $0.375 per hour for first user, and $0.22 for second user.

Note: Cromwell environments will auto-delete after idle for 7 days. Read about Cromwell costs.

* Jupyter Notebook virtual machine settings can be customized. Customizing default settings will impact costs. 
** RStudio and SAS virtual machine settings cannot be customized.
***Persistent disks incur a monthly charge until deleted. You can check for and delete unnecessary persistent disks by visiting your "Cloud Environments" page.

Read more about supported analysis applications and data storage options in the Researcher Workbench. For information about using Cromwell, read our support article. For information about using the genomics data extraction tool, read our support article.

To help assess total estimated costs for your project, you’ll want to consider the volume and size of the data, the application and virtual machine settings, the amount and size of files you plan to store, and the total amount of time you anticipate needing to run analysis and storing data.

Estimated costs for featured workspaces

In the next table, we provide some example costs for a selection of demonstration workspaces stored in the Researcher Workbench featured workspace collection.

FEATURED WORKSPACE COST BREAKDOWNS
WORKSPACE TITLE DATA SIZE AND VOLUME VIRTUAL MACHINE SETTINGS TOTAL COST AND TIME
LDL Cholesterol GWAS

Controlled Tier genomic data

39,924 participants

3.4M variants

Jupyter Notebook

Main node: 16 CPUs, 104GB RAM, 100GB Disk

$10 for running analysis

PheWAS x GWAS

Controlled Tier genomic data

~70k participants

~5k variants

Jupyter Notebook

Main node: 96 CPUs, 360GB RAM, 120GB Disk

$35 for running analysis for 7 hours, not including variant filtering
O3_Manipulate Hail VariantDataset (VDS) Tutorial Notebook

Controlled Tier genomic data

Filter the whole VDS to 124 variants and 15,375 samples and write to various file formats (VCF, BGEN files, PLINK BED, Hail MatrixTable)

Jupyter Notebook

Main node: 4 CPUs, 15GB RAM, 100GB Disk

Workers (50/50): 4 CPUs, 15GB Disk

$25 for 1.5 hours of analysis
Wearables & The Human Phenome

Registered Tier Fitbit and EHR data

214,206 participants with EHR data

~50.6 billion steps data from Fitbit

Jupyter Notebook

Main node: 4 CPUs, 15GB RAM, 120GB Disk

$35 for running analysis

Next articles

Using All of Us Initial Credits

Understand how to use your $300 initial credits provided by the All of Us Research Program.

Paying for Researcher Workbench Costs

Explore options available for paying Researcher Workbench costs.

Was this article helpful?

6 out of 8 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.