How to use Cromwell in the All of Us Researcher Workbench

  • Updated

The All of Us Researcher Workbench (RW) added Cromwell for workflow management. Using a workflow manager can be helpful when running complex workflows; the workflow manager will submit and manage jobs for you, coordinating the interdependencies between tasks. Cromwell runs workflows written in the Workflow Description Language (WDL) and was specifically designed for scientific workflows, though it can run any type of workflow.

In the RW, you can submit workflows to Cromwell through Cromshell, a command line tool for interacting with Cromwell. Cromshell can be run in a Jupyter environment, either through a Jupyter notebook or a Jupyter terminal.

Using Cromwell will make it easier to manage complex workflows in the RW, especially when dealing with large-scale data such as the genomics data available with All of Us Research Program. See this video to learn more about Cromwell in the Researcher Workbench. 

Note: we suggest using us-central region when launching Google Lifescience API batch jobs because our CDR bucket and your buckets live in us-central1. If you launch API jobs in other regions, you will incur network egress charges.

To learn more about using Cromwell in the Researcher Workbench, see this tutorial video: 

 

 

Starting Cromwell

Create the Cromwell and Jupyter Environments

Cromwell is run within a workspace and is an application in the workspace. To get started, go into a workspace or create a new workspace. Once you are within a workspace, select the Cromwell icon (a pink pig) on the right hand menu. The Cromwell Cloud Environment information screen will open and at the bottom right of the screen, you will select ‘Start’.

Once your Cromwell Cloud Environment is created, you will access Cromshell through a Jupyter terminal or notebook. While your Cromwell Cloud Environment is creating, you will create a Jupyter environment in the same workspace. In the same right hand menu, select the Jupyter icon. The default Jupyter configuration will be fine for this step and you can select ‘Start’.

You can check the status of the Cromwell and Jupyter environments by selecting the cloud icon (a cloud and lighting bolt) on the same right hand menu. This brings up a summary of your active applications.

 

Picture7.png

2.png

 

Auto-delete

When you start the Cromwell app, you can opt-in to use the auto-delete function, which will automatically delete your Cromwell application if left idle for a set period of days. You opt-in by checking the box within the panel. You can only enable auto-delete before starting your application.

 

auto delete cromwell.png

There are multiple options available for the idle period before your application will be deleted, including 1, 3, 7, 8, 15, and 30 days.The time counts down when you do not have an active Cromwell app open. We recommend that you use auto-delete if you are concerned about incurring costs by leaving the app running in the background.

 

Run snippets to link the Cromwell and Jupyter environments

Once you have created the Cromwell environment and Jupyter environment, you can create a new Jupyter notebook where you can submit your workflow with Cromwell. In the top menu, select ‘Snippets’ and then select the ‘All of Us Cromwell Setup Python snippets’. This snippet will set up the network connection between the Jupyter and Cromwell environments. In addition, it will run a status check to verify that Cromwell and Jupyter are set up correctly.

 

3.png

 

Run this snippet in the Jupyter notebook. Once it has been run, you are ready to run workflows managed by Cromwell. You can submit workflows and interact with Cromwell through the command line tool Cromshell in a Jupyter terminal or notebook. You can use Jupyter in the terminal by launching a new terminal from the right hand menu.

If you see any problems with your submissions, you should try re-running the ‘All of Us Cromwell Setup Python snippets’. If the snippet fails, you can try fixing the problem by restarting or recreating (delete and create) Cromwell.

 

Cromshell Workflow Operations

You can submit a workflow to Cromwell using the following Cromshell command: cromshell-alpha submit workflow.wdl parameters.json. Cromshell submits the workflow written in WDL to Cromwell along with the configuration options in the JSON file.

 

After you submit a workflow, you can find the submission ID at the bottom of the output:

{                                                  

    "id": "61605c66-0f9a-48e7-8b88-e83eaf62debc",

    "status": "Submitted"

}

You can check the status of the workflow with the following Cromshell command: cromshell-alpha status <submissionID>.

 

You can abort a workflow with following command: cromshell-alpha abort <submissionID>

Refer to https://github.com/broadinstitute/cromshell for additional commands. Please note that you need to use cromshell-alpha as the command to call Cromshell in a command line on the RW.

 

WDL File Configuration

When configuring your WDL file, you can use following docker image:

  • “us.gcr.io/broad-gatk/gatk:4.2.6.1"

 

Saving a WDL or JSON file

There are multiple options to save a WDL and a JSON to your notebook:

1. In a Jupyter notebook, use the %%writefile <filename>  command followed by the WDL or JSON file

2. From a Jupyter notebook, select the Jupyter icon in the upper left corner to access the file browser. You can use the Jupyter file browser to upload existing JSON or WDL files.  The root folder here is available at /home/jupyter.

 

Picture8.png

 

3. In a Jupyter terminal, use vim <filename> to create (and edit) a file

 

We also recommend saving your WDL and JSON files to your workspace bucket. This allows you to easily access the files to run your workflow and to use the files again even after you delete your cloud environment. You can save files to your bucket with the following command:

 

gsutil cp <filename> $WORKSPACE_BUCKET 

 

The workspace bucket is attached to your workspace. You can share the workspace bucket with your colleagues by sharing the workspace. See this article on workspace buckets for more details about workspace storage.

 

Terminating Cromwell

We recommend deleting your Cromwell environment when you are not actively running workflows in order to reduce cost. Auto-pause and auto-delete are currently not supported for Cromwell environments so you must actively control the status of your Cromwell environments.

After completing your analysis, you can delete your Cromwell environment from the right hand menu. Select the cloud icon to see a summary of your active applications. From there, you can delete the Cromwell environment.

 

delete cromwell.png

 

When you delete a Cromwell environment, you lose workflow metadata from that environment. Any of your results or data can be saved in your workspace bucket and will not get deleted when you delete the Cromwell environment. You can also enable auto-delete when starting an environment. 

 

Billing

Cromwell incurs a per-workspace cost when both running and paused of $0.20/hour. Each Cromwell instance also incurs a per-app cost when running of $0.20/hour.

  • When running: $0.20/hour + ($0.20/hour x number of Cromwell apps running).
  • When paused: $0.20/hour

 

If you have one Cromwell application in your workspace: Cromwell costs $296/month when running and $148/month when paused. Each additional Cromwell application will cost $148/month when running or paused.

 

Note: these costs do not include your persistent disk.

 

We recommend always pausing your Cromwell environment when you are not running workflows in order to avoid the cost to run Cromwell. As a reminder, auto-pause and auto-delete are not supported.

 

Known issues and limitations for users

Two users cannot start applications at the same time in a workspace

In workspaces with more than one active user, two users cannot attempt to start an application (Cromwell, Jupyter, etc) at the same time (within a few minutes of each other). If this happens, the application will not start for one of the users.

  • Mitigation: The solution is to wait a few minutes and try again.
  • Remediation: This will be addressed with a clearer error message before shipping to prod.

 

Changing the combination of Jupyter and Cromwell environments in a workspace

The ‘All of Us Cromwell Setup Python snippets’ must be run anytime there is a change in the combination of Jupyter and Cromwell environments in order to correctly link the environments. We recommend re-running the snippets whenever there is an unexpected error.

 

Only Google Container Registry is supported for docker images

Due to the Internet access restriction on Workbench batch VMs, standard docker repositories such as Docker Hub will not be accessible to WDLs. It is instead recommended to configure all tasks in your WDLs to run public docker images from Google Container Registry (GCR). Typically, GCR docker URLs start with us.gcr.io/. As an example, the GATK 4.2.6.1 docker image in GCR is us.gcr.io/broad-gatk/gatk:4.2.6.1 as opposed to broadinstitute/gatk:4.2.6.1 in dockerhub. You can learn more about this limitation in the Overview of Batch Processing on the All of Us User Support Hub.

Was this article helpful?

0 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.