Overview of applications in the Researcher Workbench

  • Updated

Recognizing that researchers want to analyze data in different ways depending on their specific needs, the Researcher Workbench supports a variety of cloud-based applications to analyze All of Us participant data. Unlike the Cohort and Dataset Builders, applications incur costs because they require cloud environments to use. A cloud environment refers to a virtualized infrastructure that allows users to access and manage computing resource over the internet. These applications can be used to analyze participant data that's generated manually or with the Cohort and Dataset Builders.

Below is an up-to-date list of the different analysis tools along with information about associated cloud environments, basic demonstration videos, support materials, and other relevant information. This article will be updated as other applications are made available in the Workbench.

Jupyter Notebooks


Jupyter Notebooks are a popular tool used by data scientists, researchers, and developers for interactive computing and data analysis. They provide an environment where users can write and execute code, visualize data, and document their work in a single document. R or Python notebooks are supported in the Researcher Workbench, allowing users to choose the language that best suits their needs. These notebooks consist of a series of cells, each of which can contain code, text, equations, or visualizations. Users can run individual cells or the entire notebook, making it easy to experiment with code and see the results in real-time. 



Jupyter Notebooks are powered by Jupyter environments, which can be enabled in the Jupyter icon in the right panel of a workspace as shown below. 



A variety of customization options are available, depending on your computing needs. If you'd like to understand more about customizing and deleting your Jupyter environment, including about using Dataproc clusters for Hail analyses, please see this article that details different aspects of cloud environments and how to optimize them.

Environments can be created by clicking the create environment button in the bottom right corner.

create env.png


In addition to Jupyter Notebooks, Jupyter environments can be used when analyzing data using the terminal.




NOTE: Jupyter environments are auto-deleted every 1 - 2 weeks, so please make sure important files are transferred to the workspace bucket unless you're using a persistent disk.


Support resources

Since Jupyter Notebooks have been supported since launch, most of our support materials are designed around their use. This includes our featured workspaces (requires Researcher Workbench login), which are nearly all designed to use Jupyter Notebooks. Here are a few resources that are specifically related to getting started in Jupyter Notebooks.



RStudio is an integrated development environment (IDE) for R, a programming language used for statistical computing and graphics. It provides a user-friendly interface that makes it easier for users to write, debug, and execute R code. RStudio offers a wide range of features and tools, including a code editor with syntax highlighting and auto-completion, a console for executing R commands, a workspace for managing objects and datasets, and a plotting window for visualizing data. It also supports the creation of interactive documents and reports through its integration with R Markdown. 



RStudio is used in conjunction with RStudio environments, which can be created by clicking the R icon on the right hand side of a workspace as shown below. Environments can't be customized and cost a fixed $0.40 when running.




Unlike Jupyter environments, RStudio does not have a pause or auto-pause function available, so it’s important to delete the application when you are finished in order to minimize costs. When you create the application, you can select the auto-delete option, which deletes your application after a certain period of idle time. You can save your work on your persistent disk or workspace bucket and create the app again when you are ready to use RStudio again.




Support resources

SAS Studio


SAS is a statistical software tool used for data analysis and statistical modeling. The SAS Studio application in the Researcher Workbench connects to a SAS server in order to process SAS commands. The SAS server is hosted in a cloud environment. After code is processed by the SAS server, the results are returned to the SAS app in your workspace. SAS has the ability to handle large datasets efficiently, with an extensive set of statistical techniques, and a user-friendly interface. 


SAS Studio on the Researcher Workbench uses a cloud environment, which may be slightly different than using a SAS Studio application on your local computer. SAS Studio runs on a virtual machine (VM) or clusters of machines in your workspace cloud analysis environment. The SAS cloud environment on the Researcher Workbench is a VM with 4 CPUs, 15 GB RAM, and 250 GB of disk space and is not customizable

To start SAS Studio, follow the instructions outlined in this support article. 

image 2.png

Unlike Jupyter environments, SAS Studio does not have a pause or auto-pause function available, so it’s important to delete the application when you are finished in order to minimize costs. When you create the application, you can select the auto-delete option, which deletes your application after a certain period of idle time. You can save your work on your persistent disk or workspace bucket and create the app again when you are ready to use SAS Studio again.


Support Resources

We offer several support resources to help you get started using SAS in the Researcher Workbench such as: 

Cromwell for workflows


Cromwell is a workflow management system that is designed to help scientists and researchers organize and execute complex computational workflows. It provides a platform for defining, running, and monitoring workflows, making it easier to manage and automate scientific analyses. With Cromwell, users can define their workflows using a simple and intuitive syntax, allowing them to specify the steps and dependencies of their analysis. This makes it easier to break down complex tasks into smaller, more manageable units, improving reproducibility and scalability. Cromwell also provides a range of features to enhance workflow execution. It supports a variety of execution backends, allowing users to run their workflows on different computing infrastructures, such as local machines, clusters, or cloud platforms. It provides detailed logs and metrics, allowing users to troubleshoot issues and optimize their workflows for better performance. 


Cromwell environments

Workflows can be submitted in the Researcher Workbench through Cromshell, which is a command line tool for interacting with Cromwell. They can be created through the flying pink pig Cromwell icon on the right side of the workspace panel as described in this support article. You can then start an environment by clicking the Start button.




Cromwell environments aren't customizable on the Workbench, though using Workflow Description Language (WDL) you can customize the way a workflow is run. In order to use Cromshell, you'll need to do so through a Jupyter Notebook or via the terminal, which require a Jupyter environment to power. 


Unlike Jupyter environments, Cromwell environments can't be paused or auto-paused and need to be deleted from the apps menu, which we recommend doing to save on costs. Select the cloud icon to see a summary of your active applications. From there, you can delete the Cromwell environment. You do have the option to enable auto-delete the environment after a certain number of days, which is described more here.


delete cromwell.png


Support resources

  • Below is a video that walks through working with Cromwell in the Researcher Workbench.


Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request



Article is closed for comments.