In this article, we will explain how to use publicly available docker images from Docker Hub on the Researcher Workbench. Using the Google Artifact Registry remote repository feature, the process of using images from Docker Hub enables quick customization of the software and tools you use for your analyses.
Introduction
Docker images are packages of customized software, libraries, and environment variables that can be loaded and executed into your machine in order to set up your compute environment and dependencies. Using docker images can save time and effort by eliminating the need to recreate common software environments from scratch.
One common use case for Docker images is to set up software and a compute environment that is necessary for your analysis but that is not available in the base environment on the Workbench.
There are multiple repositories where you can find Docker images. In this article, we are focusing on publicly available images on Docker Hub. Docker Hub is a commonly used repository for Docker images, where you can access or store images. To learn more about creating your own Docker image, see the Docker documentation.
Another common use case for Docker images on the Researcher Workbench is within batch processing, including Cromwell and Nextflow. You can learn more about how to use Docker images within your batch processing analyses in these resources:
- WDL resources on the Terra Support site
- WDL resources on the Dockstore site
- dsub resources on the User Support Hub
Using the All of Us artifact registry repository to pull Docker images
| To keep tools updated with the latest improvements from GCP, we migrated our workflow tools from GLS to Google Batch. Users are able to directly call in Docker Hub images using Google Batch. For users interested in using GAR to save images, please research all requirements needed for this process in the workbench. |
To use a Docker image from Docker Hub, you can use Google Batch or the Google Artifact Registry (GAR) remote repository feature to pull the image into the Researcher Workbench.
To use GAR, you will use the environment variable ARTIFACT_REGISTRY_DOCKER_REPO. This variable corresponds to us-central1-docker.pkg.dev/all-of-us-rw-prod/aou-rw-gar-remote-repo-docker-prod. When you are setting up the Docker image, you will refer to the location by appending the artifact registry environment variable to the location of your Docker image.
For example, if you want to use the latest ubuntu image from Docker Hub, the base location is ubuntu:latest. You will append the ubuntu latest location to the artifact registry variable. The way that you append these locations depends on the analysis tool you are using. Generally, you will append the DockerHub image location following a backslash to the ARTIFACT_REGISTRY_DOCKER_REPO variable: os.environ["ARTIFACT_REGISTRY_DOCKER_REPO"]/ubuntu:latest.
If you are interested in using a private Docker Hub image in the Researcher Workbench, please contact support@researchallofus.org. Private Docker image support is very limited in the Researcher Workbench, and will require specific access to the custom docker repository.
The following examples demonstrate how you could use GAR in different analysis tools. For examples of using Google Batch, please see the Cromwell, Nextflow, and dsub Featured Workspaces.
Setting up a Docker image within a WDL
In an example setting up a Docker image within a Workflow Description Language file (or WDL for short), we set up a variable within the docker runtime variable building the ARTIFACT_REGISTRY_DOCKER_REPO and append /ubuntu:latest.
This WDL script can be used in batch analyses using tools like Cromwell.
wdl_filename = "hello.wdl"
WDL_content = """
task hello {
String addressee
command {
echo "Hello ${addressee}!"
}
output {
String salutation = read_string(stdout())
}
runtime {
docker: '""" + os.environ["ARTIFACT_REGISTRY_DOCKER_REPO"] + """/ubuntu:latest'
}
}
workflow wf_hello {
call hello
output {
hello.salutation
}
}
"""
fp = open(wdl_filename, 'w')
fp.write(WDL_content)
fp.close()
Setting up a Docker image using Nextflow
When running a Nextflow batch analysis, you can set up the Docker image within the Nextflow run command.
Here is an example of the command in a Python Jupyter notebook. We are using the hla latest Docker image posted by zlskidmore.
!nextflow run test.nf -c ~/.nextflow/config -profile gcb
-process.container="${ARTIFACT_REGISTRY_DOCKER_REPO}/zlskidmore/hla-la:latest"
Comments
0 comments
Article is closed for comments.