Overview of applications in the Researcher Workbench

  • Updated

Recognizing that researchers want to analyze data in different ways depending on their specific needs, the Researcher Workbench supports a variety of cloud-based applications to analyze All of Us participant data. Unlike the Cohort and Dataset Builders, applications incur costs because they require cloud environments to use. A cloud environment refers to a virtualized infrastructure that allows users to access and manage computing resource over the internet. These applications can be used to analyze participant data that's generated manually or with the Cohort and Dataset Builders.

Below is an up-to-date list of the different analysis tools along with information about associated cloud environments, basic demonstration videos, support materials, and other relevant information. This article will be updated as other applications are made available in the Workbench.

Jupyter Notebooks

Overview

Jupyter Notebooks are a popular tool used by data scientists, researchers, and developers for interactive computing and data analysis. They provide an environment where users can write and execute code, visualize data, and document their work in a single document. R or Python notebooks are supported in the Researcher Workbench, allowing users to choose the language that best suits their needs. These notebooks consist of a series of cells, each of which can contain code, text, equations, or visualizations. Users can run individual cells or the entire notebook, making it easy to experiment with code and see the results in real-time. 

 

Environments

Jupyter Notebooks are powered by Jupyter environments, which can be enabled in the Jupyter icon in the right panel of a workspace as shown below. 

 

jupyter.png

A variety of customization options are available, depending on your computing needs. If you'd like to understand more about customizing and deleting your Jupyter environment, including about using Dataproc clusters for Hail analyses, please see this article that details different aspects of cloud environments and how to optimize them.

 

Environments can be created by clicking the create environment button in the bottom right corner.

 

create env.png

 

In addition to Jupyter Notebooks, Jupyter environments can be used when analyzing data using the terminal.

 

terminal.png

 

NOTE: Jupyter environments are auto-deleted every 1 - 2 weeks, so please make sure important files are transferred to the workspace bucket unless you're using a persistent disk.

 

Support resources

Since Jupyter Notebooks have been supported since launch, most of our support materials are designed around their use. This includes our featured workspaces (requires Researcher Workbench login), which are nearly all designed to use Jupyter Notebooks. Here are a few resources that are specifically related to getting started in Jupyter Notebooks.

Cromwell for workflows

Overview

Cromwell is a workflow management system that is designed to help scientists and researchers organize and execute complex computational workflows. It provides a platform for defining, running, and monitoring workflows, making it easier to manage and automate scientific analyses. With Cromwell, users can define their workflows using a simple and intuitive syntax, allowing them to specify the steps and dependencies of their analysis. This makes it easier to break down complex tasks into smaller, more manageable units, improving reproducibility and scalability. Cromwell also provides a range of features to enhance workflow execution. It supports a variety of execution backends, allowing users to run their workflows on different computing infrastructures, such as local machines, clusters, or cloud platforms. It provides detailed logs and metrics, allowing users to troubleshoot issues and optimize their workflows for better performance. 

 

Cromwell environments

Workflows can be submitted in the Researcher Workbench through Cromshell, which is a command line tool for interacting with Cromwell. They can be created through the flying pink pig Cromwell icon on the right side of the workspace panel as described in this support article. You can then start an environment by clicking the Start button.

 

Picture7.png

 

Cromwell environments aren't customizable on the Workbench, though using Workflow Description Language (WDL) you can customize the way a workflow is run. In order to use Cromshell, you'll need to do so through a Jupyter Notebook or via the terminal, which require a Jupyter environment to power. 

Unlike Jupyter environments, Cromwell environments can't be paused or auto-paused and need to be deleted from the apps menu, which we recommend doing to save on costs. Select the cloud icon to see a summary of your active applications. From there, you can delete the Cromwell environment.

 

2.png

 

Support resources

  • Below is a video that walks through working with Cromwell in the Researcher Workbench.

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.