All of Us workspaces have two dedicated storage locations – the Workspace Bucket and your storage disk. With your storage disk, it is either a Persistent Disk (for standard environments) or a Standard Disk (for Dataproc environments). A Standard Disk is created and deleted with your cloud environment, while a Persistent Disk can be saved even when your compute environment is deleted.
A Persistent Disk (PD) is a reliable, high-performance block storage for virtual machine instances. Like a USB drive, the persistent disk can be detached from the Virtual Machine upon deletion and re-attached to a new one, allowing for files to be stored permanently. The PD lets you keep the packages your notebook code is built upon, input files necessary for your analysis, and outputs you’ve generated - without having to move anything to the workspace bucket for permanent storage.
You can learn more about persistent disks here: https://cloud.google.com/compute/docs/disks#pdspecs
How to edit the Persistent Disk on the Cloud Environment
When starting a standard environment (standard VM) you will be required to attach a PD. This will either be a new one, or the previous one used in the workspace, if it was not deleted. To customize your PD, navigate to a workspace and click on the "Jupyter Icon" button in the right navigation bar. Scroll to “Storage Disk options” section to configure your reattachable persistent disk:
You can select either a standard persistent disk or a solid state-drive (SSD) persistent disk. You can learn more about the disk types here: https://cloud.google.com/compute/docs/disks#disk-types
Managing your Persistent Disk
Your persistent disk will be mounted on the Jupyter server $HOME directory (/home/jupyter). Persistent disks, like your cloud analysis environment, are specific to each user. The data stored in your persistent disk is not shareable, unlike the workspace bucket which is shared between workspace users.
Note: if you install Python and R packages, they will live under `$HOME/packages/` (/home/jupyter/packages).
Once you have a persistent disk attached to an active environment, you can change the persistent disk type. However, some changes may require deletion and re-creation of your persistent disk and cloud environment to take effect. This will delete all files on the disk. If you want to save some files permanently, such as input data, analysis outputs, or installed packages, copy them to the workspace bucket. Note: Jupyter notebooks are autosaved to the workspace bucket, and deleting your disk will not delete your notebooks.
Delete your environment with a Persistent Disk
If you have attached a persistent disk and want to delete your environment, you will have two options:
If you save your persistent disk for later and do not have an active environment, you can delete your persistent disk by navigating to the cloud analysis environment panel. At the bottom, you will see an option to ‘Delete Persistent Disk’:
If you decide to delete your persistent disk, all files on the disk will be deleted. If you want to permanently save some files from the disk before deleting it, you will need to create a new cloud environment to access it.
Check to see your active Persistent Disk(s)
Users can have one PD per workspace. Sometimes, you may want to verify if you do or do not have an active PD being utilized in a particular workspace. There are two methods for checking:
- In the desired workspace, click on the “Jupyter Icon”. If you see the option to ‘DELETE PERSISTENT DISK’ at the bottom in blue, then you DO have an active PD being used. If you do not see this option, or it is grayed out, then you DO NOT have an active PD being used.
- From the drop down menu in the top left corner of the workbench, you can click the down arrow next to your name and navigate to the tab called ‘Cloud Environments’. This screen will show any active environments as well as any active PDs. Although you will not be able to distinguish which workspace the PDs are associated with, you are able to visualize how many you do have. Feel free to submit a Help Desk Ticket with the info about the particular PD from that screen and we can look up which workspace it is associated with.
Cost of a reattachable Persistent Disk
A cost per hour is associated with maintaining the disk even when the cloud compute is paused or deleted. A standard persistent disk costs $.04/GB/month. [Disks and images pricing | Compute Engine: Virtual Machines (VMs)]
Reattachable Persistent Disk Limitations
If you want to use a Dataproc cluster instead of a Standard VM, you can only use standard disk as a storage option. Reattachable persistent disks are not supported with Dataproc.