Paying for your research
Creating an All of Us Researcher Workbench account is free. Additionally, the All of Us Research Program provides $300 initial credits to each registered researcher that can be applied towards cloud computational costs. Once these initial credits have been exhausted, a Google Cloud Platform (GCP) billing account must be set up to proceed with analyses on the Workbench.
Computational costs breaks down as follows:
Workspace Data + (Compute Applications and Customizations * Time) + (Storage * Time)
WHAT ARE YOU PAYING FOR? | |
DATA |
Data are stored in Google BigQuery and extracted for analysis based on the criteria used to define your cohorts and datasets. The larger the data and the more of it you extract, the more your costs will increase. You can save on costs by being intentional about selecting the data you plan to extract. Note: Minute-level Fitbit and genomic data are exceptionally large. |
APPLICATIONS |
Applications are loaded into virtual machines supported by Google Cloud Engine (GCE). Applications incur per hour costs when running or paused. For customizable applications, costs will vary based on the settings you select for things like CPUs, RAM, etc. |
STORAGE | Data and project files can be stored to the workspace storage bucket (recommended) or persistent disks. Costs vary by the size and amount of data stored. |
WORKSPACE |
All costs are incurred at the workspace level and billed through the workspace creator's account. Costs include ALL computational spend associated with a workspace including spend incurred by a collaborator with owner or writer access. |
Analysis costs by tools and features
You have a choice in which application to use for data analysis. Computational resource costs vary by application.
APPLICATIONS | |||
APP FEATURES | JUPYTER NOTEBOOK | RSTUDIO | SAS STUDIO |
Default virtual machine setting |
4 CPUs, 15GB RAM* | 4 CPUs, 15GB RAM** | 4 CPUs, 15GB RAM** |
Cost when running | $0.20 per hour | $0.40 per hour | $0.40 per hour |
Cost when paused | < $0.01 per hour | $0.21 per hour | |
Persistent disk cost*** | $4.80 per month | $4.00 per month | $10.00 per month |
EXAMPLES OF OTHER COSTS | |||
DATA EXTRACTION | WORKSPACES | DATA STORAGE | CROMWELL |
$0.02 per sample when using the genomic extraction tool to extract variant data. Researchers who plan to use the genomics CRAM files (e.g., raw data) should not that these files are exceptionally large and are accompanied by egress charges incurred through bucket extraction. Data otherwise extracted from the CDR are completed through SQL queries that are billed by the number of bytes read. The more data you extract, the more the query will cost. |
$0.20 per workspace per month |
Saving to the workspace bucket is the most cost efficient saving method, billed $0.026 per GB per month. See "Applications" above for standard persistent disk costs. Costs can increase if PD is upgraded from standard to solid state-drive or GB storage increases. |
$270 per month for one application when running. Each additional application will costs $158 per month when running. When running, a Cromwell instance is $0.375 per hour for first user, and $0.22 for second user. Note: Cromwell environments will auto-delete after idle for 7 days. Read about Cromwell costs. |
* Jupyter Notebook virtual machine settings can be customized. Customizing default settings will impact costs.
** RStudio and SAS virtual machine settings cannot be customized.
***Persistent disks incur a monthly charge until deleted. You can check for and delete unnecessary persistent disks by visiting your "Cloud Environments" page.
Read more about supported analysis applications and data storage options in the Researcher Workbench. For information about using Cromwell, read our support article. For information about using the genomics data extraction tool, read our support article.
To help assess total estimated costs for your project, you’ll want to consider the volume and size of the data, the application and virtual machine settings, the amount and size of files you plan to store, and the total amount of time you anticipate needing to run analysis and storing data.
Estimated costs for featured workspaces
In the next table, we provide some example costs for a selection of demonstration workspaces stored in the Researcher Workbench featured workspace collection.
FEATURED WORKSPACE COST BREAKDOWNS | |||
WORKSPACE TITLE | DATA SIZE AND VOLUME | VIRTUAL MACHINE SETTINGS | TOTAL COST AND TIME |
LDL Cholesterol GWAS |
Controlled Tier genomic data 39,924 participants 3.4M variants |
Jupyter Notebook Main node: 16 CPUs, 104GB RAM, 100GB Disk |
$10 for running analysis |
PheWAS x GWAS |
Controlled Tier genomic data ~70k participants ~5k variants |
Jupyter Notebook Main node: 96 CPUs, 360GB RAM, 120GB Disk |
$35 for running analysis for 7 hours, not including variant filtering |
O3_Manipulate Hail VariantDataset (VDS) Tutorial Notebook |
Controlled Tier genomic data Filter the whole VDS to 124 variants and 15,375 samples and write to various file formats (VCF, BGEN files, PLINK BED, Hail MatrixTable) |
Jupyter Notebook Main node: 4 CPUs, 15GB RAM, 100GB Disk Workers (50/50): 4 CPUs, 15GB Disk |
$25 for 1.5 hours of analysis |
Wearables & The Human Phenome |
Registered Tier Fitbit and EHR data 214,206 participants with EHR data ~50.6 billion steps data from Fitbit |
Jupyter Notebook Main node: 4 CPUs, 15GB RAM, 120GB Disk |
$35 for running analysis |
Next articles
Using All of Us Initial Credits
Understand how to use your $300 initial credits provided by the All of Us Research Program.
Paying for Researcher Workbench Costs
Explore options available for paying Researcher Workbench costs.
Comments
0 comments
Article is closed for comments.