The All of Us Researcher Workbench (RW) added Cromwell for workflow management. Using a workflow manager can be helpful when running complex workflows; the workflow manager will submit and manage jobs for you, coordinating the interdependencies between tasks. Cromwell runs workflows written in the Workflow Description Language (WDL) and was specifically designed for scientific workflows, though it can run any type of workflow.
In the Researcher Workbench, you can submit workflows to Cromwell through Cromshell, a command line tool for interacting with Cromwell. Cromshell can be run in a Jupyter environment, either through a Jupyter notebook or a Jupyter terminal.
Using Cromwell will make it easier to manage complex workflows in the workbench, especially when dealing with large-scale data such as the genomics data available with All of Us Research Program. See this video to learn more about Cromwell in the Researcher Workbench, as well as this Featured Workspace, How to run WDLs using Cromwell in the Researcher Workbench (v7).
Note: we suggest using us-central region when launching Google Lifescience API batch jobs because our CDR bucket and your buckets live in us-central1. If you launch API jobs in other regions, you will incur network egress charges.
To learn more about using Cromwell in the Researcher Workbench, see this tutorial video:
Starting Cromwell
Create the Cromwell and Jupyter Environments
Cromwell is run within a workspace and is an application in the workspace. To get started, select the Cromwell icon (a pink pig) on the right hand menu within your workspace. The Cromwell Cloud Environment information screen will open and at the bottom right of the screen, you will select ‘Start’.
Once your Cromwell Cloud Environment is created, you will access Cromshell through a Jupyter terminal or notebook. While your Cromwell Cloud Environment is creating, you will create a Jupyter environment in the same workspace. In the same right hand menu, select the Jupyter icon. The default Jupyter configuration will be fine for this step and you can select ‘Start’.
You can check the status of the Cromwell and Jupyter environments by selecting the cloud icon (a cloud and lighting bolt) on the same right hand menu. This brings up a summary of your active applications.
Auto-delete
When you start the app, you are opted in to the auto-delete function, which will automatically delete your Cromwell application if left idle for 7 days or the other specified time period. Your persistent disk will not be deleted when your Cromwell application is auto-deleted. You can change your idle period or opt-out of auto-delete in the Cromwell Cloud Environment panel.
We recommend that you use auto-delete if you are concerned about incurring costs by leaving the app running in the background.
Run snippets to link the Cromwell and Jupyter environments
Once you have created the Cromwell environment and Jupyter environment, you can create a new Jupyter notebook where you can submit your workflow with Cromwell. In the top menu, select ‘Snippets’ and then select the ‘All of Us Cromwell Setup Python snippets’. This snippet will set up the network connection between the Jupyter and Cromwell environments. In addition, it will run a status check to verify that Cromwell and Jupyter are set up correctly.
Run this snippet in the Jupyter notebook. Once it has been run, you are ready to run workflows managed by Cromwell. You can submit workflows and interact with Cromwell through the command line tool Cromshell in a Jupyter terminal or notebook. You can use Jupyter in the terminal by launching a new terminal from the right hand menu.
If you see any problems with your submissions, you should try re-running the ‘All of Us Cromwell Setup Python snippets’. If the snippet fails, you can try fixing the problem by restarting or recreating (delete and create) Cromwell.
Cromshell Workflow Operations
You can submit a workflow to Cromwell using the following Cromshell command: cromshell-alpha submit workflow.wdl parameters.json. Cromshell submits the workflow written in WDL to Cromwell along with the configuration options in the JSON file.
After you submit a workflow, you can find the submission ID at the bottom of the output:
{
"id": "61605c66-0f9a-48e7-8b88-e83eaf62debc",
"status": "Submitted"
}
You can check the status of the workflow with the following Cromshell command: cromshell-alpha status <submissionID>.
You can abort a workflow with following command: cromshell-alpha abort <submissionID>
Refer to https://github.com/broadinstitute/cromshell for additional commands. Please note that you need to use cromshell-alpha as the command to call Cromshell in a command line on the RW.
WDL File Configuration
When configuring your WDL file, you can use following docker image:
- “us.gcr.io/broad-gatk/gatk:4.2.6.1"
Saving a WDL or JSON file
There are multiple options to save a WDL and a JSON to your notebook:
1. In a Jupyter notebook, use the %%writefile <filename> command followed by the WDL or JSON file
2. From a Jupyter notebook, select the Jupyter icon in the upper left corner to access the file browser. You can use the Jupyter file browser to upload existing JSON or WDL files. The root folder here is available at /home/jupyter.
3. In a Jupyter terminal, use vim <filename> to create (and edit) a file
We also recommend saving your WDL and JSON files to your workspace bucket. This allows you to easily access the files to run your workflow and to use the files again even after you delete your cloud environment. You can save files to your bucket with the following command:
gsutil cp <filename> $WORKSPACE_BUCKET
The workspace bucket is attached to your workspace. You can share the workspace bucket with your colleagues by sharing the workspace. See this article on workspace buckets for more details about workspace storage.
Terminating Cromwell
We recommend deleting your Cromwell environment when you are not actively running workflows in order to reduce cost. Auto-pause is currently not supported for Cromwell environments so you must actively control the status of your Cromwell environments. If the Cromwell app remains idle for 7 days, the application will be auto-deleted.
After completing your analysis, you can delete your Cromwell environment from the right hand menu. Select the cloud icon to see a summary of your active applications. From there, you can delete the Cromwell environment.
When you delete a Cromwell environment, you lose workflow metadata from that environment. Any of your results or data can be saved in your workspace bucket and will not get deleted when you delete the Cromwell environment.
We recommend always pausing your Cromwell environment when you are not running workflows in order to avoid the cost to run Cromwell. As a reminder, auto-pause is not supported.
Billing
Cromwell incurs a per-workspace cost. If you have one Cromwell application in your workspace, Cromwell costs ~ $270/month when running. Each additional Cromwell application will cost ~$158/month when running.
-
First Application: $270/per month ($0.375 x 24 hr x 30 days)
- First user creates Cromwell app.
- Management fee ($0.10/hr), default nodepool ($0.055/hr) + individual user nodepool ($0.22 with default settings in the Researcher Workbench)
- First user creates second Cromwell app within same workspace - no additional fee.
-
Second application: $158/month ($0.22 x 24 hr x 30 days)
- Different user creates a second Cromwell app in the same workspace.
- $0.22/hr for the individual user nodepool.
- Cost are shared for all users for a given workspace. User nodepool is user specific, and depends on compute profile.
- If you create a new workspace, and start a Cromwell app in that workspace, you'll be charged as a new instance.
Note: these costs do not include your persistent disk cost. Standard 50 GB disk cost ~2.00 hr, and increases with disk size. You can see disk cost within the Cromwell cloud analysis environment panel.
Known issues and limitations for users
Two users cannot start applications at the same time in a workspace
In workspaces with more than one active user, two users cannot attempt to start an application (Cromwell, Jupyter, etc) at the same time (within a few minutes of each other). If this happens, the application will not start for one of the users.
- Mitigation: The solution is to wait a few minutes and try again.
- Remediation: This will be addressed with a clearer error message before shipping to prod.
Changing the combination of Jupyter and Cromwell environments in a workspace
The ‘All of Us Cromwell Setup Python snippets’ must be run anytime there is a change in the combination of Jupyter and Cromwell environments in order to correctly link the environments. We recommend re-running the snippets whenever there is an unexpected error.
Comments
0 comments
Article is closed for comments.