This is an article that explains how to push an existing docker image to the Google Container Registry (GCR) so it can be used on the Researcher Workbench. Docker images are necessary components of many analysis methodologies including Cromwell WDLs and dsub jobs, and they offer a level of customizability to the software and compute environment that ensure tasks or analyses are completed as efficiently as possible with all the software dependencies they might need. This article solely focuses on how to push images from Docker Hub to GCR so they can be used on the Workbench rather than building images or using them on the Workbench.
For more information on building custom docker images, consult the docker docs.
For more information on using docker images on the Workbench, consult the documentation for WDLs and dsub respectively:
- WDL Documentation:
- Dsub Documentation:
An example use-case
Suppose you are a registered researcher seeking to use the Workbench for an analysis. However, after looking over the support documentation and cross-checking the documentation of important software packages required by your analysis, you might realize that the required package is not compatible with the Workbench.
It could be that the required package can only be installed with ‘sudo’ commands, and since sudo commands require root access—which Workbench users are not permitted—the package cannot be installed. While the All of Us Program is working to provide support for sudo commands in the future, limitations like this remain in effect today for security reasons.
Luckily, you are not deterred by this incompatibility! You know that somewhere on the internet—whether on FireCloud, Docker Hub, or the Google Container Registry (GCR)—there is a docker image that has a usable version of the software package required for your analysis. Is it possible to use that image and thus the package on the Workbench? Yes! (With some caveats, of course…)
Because the All of Us Researcher Workbench is built on Google Cloud Platform (GCP) architecture, the only images that can be used on the Workbench are those hosted on GCR. That means if you know of a Docker image with required software that’s hosted on Docker Hub or FireCloud, it will need to be pushed to GCR before you can use it on the Workbench. Additionally, for a GCR image to be usable on the Workbench, the project or bucket that stores or hosts the GCR image must be public; only public GCR images can be used on the Workbench. Finally, you cannot use the GCP project or bucket associated with a workspace to host a docker image since workspace buckets have many limitations for security reasons, and hosting images is one of them; this generally means you cannot use your @researchallofus.org account for this process and you will need a personal or institutional GCP account that can create public projects to host GCR images.
To summarize the caveats, this process has several requirements:
1. You will need to create a Google Cloud account that is separate from your @researchallofus.org account.
2. Using that Google Cloud account, you will need to make a new, public project to which you can push the image. If you have a private project, you will need to create a new project that can be made public so images pushed there can be accessed by environments on the Workbench. Consult the GCP Documentation about this process or talk to your Project’s admin members to ensure you have the correct permissions.
3. You will need to install the Google Cloud SDK to use the Google Cloud Command Line Interface (CLI) to run commands in a terminal session of your local machine.
4. You will need to install docker on your local machine.
5. You need a docker image on Docker Hub you want to pull and push to GCR. For this article, we will use the zlskidmore/hla-la image as an example.
Once the above requirements are satisfied, here is how to go about transferring that docker from Docker Hub to GCR so you can access it in the Workbench:
1. Open a command line or terminal session on your local machine.
2. Install and Authenticate the Google Cloud SDK on your local machine. Once the SDK is installed, authenticate your terminal session:
gcloud auth login
3. Configure Docker with Google Cloud SDK: To configure Docker to use gcloud as a credential helper, run:
gcloud auth configure-docker
4. Pull the Docker Image: With Docker Desktop fully installed and open on your local machine, pull the Docker image from Docker Hub:
docker pull zlskidmore/hla-la
5. Tag the Image for GCR: You’ll need to tag the Docker image with a registry name that includes your GCR path. The registry name is the Project ID of a public Google Cloud Project that you created on GCP, and the Project ID is accessible on the GCP ‘console’ page for your project. Click ‘console’ in the main GCP menu to reach this page, as seen in the screenshot here:
Remember: you cannot use the Project ID associated with any Workspace Google Projects on the All of Us Researcher Workbench for this operation; the Project must be created by you using a Google Cloud account that is distinct from your @researchallofus.org account. The path will typically look like gcr.io/[YOUR_PROJECT_ID]/[IMAGE_NAME]:[TAG]. Here is an example tagging command, replace [YOUR_PROJECT_ID] with your actual GCP project ID and [TAG] with the tag of the image you want to use:
docker tag zlskidmore/hla-la gcr.io/YOUR_PROJECT_ID/hla-la:latest
6. Push the Docker Image to GCR: Push the Docker image to Google Container Registry:
docker push gcr.io/YOUR_PROJECT_ID/hla-la:latest
7. Verify the Image in GCR: Visit the GCP Console's Container Registry section to verify that your image was successfully pushed. Remember to replace YOUR_PROJECT_ID with your actual GCP project ID in the commands above.
8. Once verified, you can now use the image in operations on the Workbench using the ‘gcr.io/YOUR_PROJECT_ID/hla-la:latest’ path.