Overview of applications in the Researcher Workbench

  • Updated

Recognizing that researchers want to analyze data in different ways depending on their specific needs, the Researcher Workbench supports a variety of cloud-based applications to analyze All of Us participant data. Unlike the Cohort and Dataset Builders, applications incur costs because they require cloud environments to use. A cloud environment refers to a virtualized infrastructure that allows users to access and manage computing resource over the internet. These applications can be used to analyze participant data that's generated manually or with the Cohort and Dataset Builders.

Below is an up-to-date list of the different analysis tools along with information about associated cloud environments, basic demonstration videos, support materials, and other relevant information. This article will be updated as other applications are made available in the Workbench.

Jupyter Notebook

Overview

Jupyter Notebook is a popular tool used by data scientists, researchers, and developers for interactive computing and data analysis. They provide an environment where users can write and execute code, visualize data, and document their work in a single document. R or Python notebooks are supported in the Researcher Workbench, allowing users to choose the language that best suits their needs. These notebooks consist of a series of cells, each of which can contain code, text, equations, or visualizations. Users can run individual cells or the entire notebook, making it easy to experiment with code and see the results in real-time. 

 

Environments

Jupyter Notebook are powered by Jupyter environments, which can be enabled in the Jupyter icon in the right panel of a workspace as shown below. 

 

jupyter.png

A variety of customization options are available, depending on your computing needs. If you'd like to understand more about customizing and deleting your Jupyter environment, including about using Dataproc clusters for Hail analyses, please see this article that details different aspects of cloud environments and how to optimize them.

Environments can be created by clicking the create environment button in the bottom right corner.

create env.png

 

In addition to Jupyter Notebook, Jupyter environments can be used when analyzing data using the terminal.

 

terminal.png

 

NOTE: Jupyter Notebook environments are auto-deleted every 1 - 2 weeks, so please make sure important files are transferred to the workspace bucket unless you're using a persistent disk.

 

Support resources

Since Jupyter Notebook have been supported since launch, most of our support materials are designed around their use. This includes our featured workspaces (requires Researcher Workbench login), which are nearly all designed to use Jupyter Notebook. Here are a few resources that are specifically related to getting started in Jupyter Notebook.

RStudio

Overview

RStudio is an integrated development environment (IDE) for R, a programming language used for statistical computing and graphics. It provides a user-friendly interface that makes it easier for users to write, debug, and execute R code. RStudio offers a wide range of features and tools, including a code editor with syntax highlighting and auto-completion, a console for executing R commands, a workspace for managing objects and datasets, and a plotting window for visualizing data. It also supports the creation of interactive documents and reports through its integration with R Markdown. 

 

Environments

RStudio is used in conjunction with RStudio environments, which can be created by clicking the R icon on the right hand side of a workspace as shown below. Environments can't be customized and cost a fixed $0.40 when running.

 

RStudio_2marked.png

 

NOTE: RStudio environments auto-delete by default if left idle for 1 day. Your persistent disk will not be deleted when your application is auto-deleted. Note: persistent disks and workspace storage incur Google Cloud Platform (GCP) costs. Read the What exactly am I paying for? article for more information on costs.

 

RStudio11.png

 

Support resources

SAS Studio

Overview

SAS is a statistical software tool used for data analysis and statistical modeling. The SAS Studio application in the Researcher Workbench connects to a SAS server in order to process SAS commands. The SAS server is hosted in a cloud environment. After code is processed by the SAS server, the results are returned to the SAS app in your workspace. SAS has the ability to handle large datasets efficiently, with an extensive set of statistical techniques, and a user-friendly interface. 

Environments

SAS Studio on the Researcher Workbench uses a cloud environment, which may be slightly different than using a SAS Studio application on your local computer. SAS Studio runs on a virtual machine (VM) or clusters of machines in your workspace cloud analysis environment. The SAS cloud environment on the Researcher Workbench is a VM with 4 CPUs, 15 GB RAM, and 250 GB of disk space and is not customizable

To start SAS Studio, follow the instructions outlined in this support article. 

SAS Steps.png

NOTE: SAS Studio environments auto-delete by default if left idle for 1 day. Your persistent disk will not be deleted when your application is auto-deleted. Note: persistent disks and workspace storage incur Google Cloud Platform (GCP) costs. Read the What exactly am I paying for? article for more information on costs.

image22.png

Support Resources

We offer several support resources to help you get started using SAS in the Researcher Workbench such as: 

Cromwell for workflows

Overview

Cromwell is a workflow management system that is designed to help scientists and researchers organize and execute complex computational workflows. It provides a platform for defining, running, and monitoring workflows, making it easier to manage and automate scientific analyses. With Cromwell, users can define their workflows using a simple and intuitive syntax, allowing them to specify the steps and dependencies of their analysis. This makes it easier to break down complex tasks into smaller, more manageable units, improving reproducibility and scalability. Cromwell also provides a range of features to enhance workflow execution. It supports a variety of execution backends, allowing users to run their workflows on different computing infrastructures, such as local machines, clusters, or cloud platforms. It provides detailed logs and metrics, allowing users to troubleshoot issues and optimize their workflows for better performance. 

 

Cromwell environments

Workflows can be submitted in the Researcher Workbench through Cromshell, which is a command line tool for interacting with Cromwell. They can be created through the flying pink pig Cromwell icon on the right side of the workspace panel as described in this support article. You can then start an environment by clicking the Start button.

 

Picture7.png

 

Cromwell environments aren't customizable on the Workbench, though using Workflow Description Language (WDL) you can customize the way a workflow is run. In order to use Cromshell, you'll need to do so through a Jupyter Notebook or via the terminal, which require a Jupyter environment to power. 

 

NOTE: Cromwell environments auto-delete by default if left idle for 7 days. Your persistent disk will not be deleted when your application is auto-deleted. Note: persistent disks and workspace storage incur Google Cloud Platform (GCP) costs. Read the What exactly am I paying for? article for more information on costs.

 

delete cromwell.png

 

Support resources

  • Below is a video that walks through working with Cromwell in the Researcher Workbench.

 

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.