Overview
Within the All of Us Researcher Workbench, each workspace has a storage option called the “workspace bucket.” The workspace buckets available in the All of Us Researcher Workbench are part of Google Cloud Storage (GCS). Workspace buckets allow you to save files needed for analysis. The workspace bucket is automatically attached to your workspace, and shared with any collaborators in the workspace. To learn more about interacting with the workspace bucket, see Accessing Files in the Workspace Bucket or Persistent Disk.
RStudio and SAS Studio have added support for Cloud Storage FUSE (gcsfuse) in the Researcher Workbench. Gcsfuse simplifies file storage, improves collaboration, and helps you save on persistent disk (PD) costs. In your RStudio & SAS applications, there is a ‘shared’ folder that mounts a Google Cloud Storage (GCS) bucket on your RStudio or SAS virtual machine (VM).
gcsfuse in the Researcher Workbench
Gcsfuse is a tool which allows you to mount and access GCS buckets as file systems on your virtual machine. This means you can interact with your GCS buckets just like you would with regular folders on your computer, making it easier to manage and share files.
Within the Researcher Workbench, gcsfuse will allow you to:
- Simplify file storage: Access and manage your files in GCS buckets directly from your RStudio & SAS applications.
- Enhance collaboration: Easily share files with colleagues using a dedicated 'shared' folder.
- Reduce PD costs: Store non-analysis files in the 'shared' folder (which resides in GCS) to reduce PD storage and save costs.
How to access the 'shared' folder
Gcsfuse is enabled by default for RStudio and SAS in the Researcher Workbench. For both applications, a new ‘shared’ folder is available to save files via gcsfuse. The ‘shared’ folder is a subfolder in the existing workspace bucket. If using an environment variable, it can be noted as $WORKSPACE_BUCKET/shared.
For SAS and RStudio, all analysis files are automatically synced to the workspace bucket (.SAS, .R, .RMD) and populated under the ‘Analysis’ tab of the workspace. Any other files generated are saved to the persistent disk. Using the ‘shared’ folder, you can easily share files between applications & with collaborators via the workspace bucket.
File Storage Recommendations
We recommend storing the following files to the ‘shared’ folder:
- Any files you would like to easily open in another application (RStudio, SAS).
- Any files you would like to share with another collaborator (that are not synced to a persistent disk).
- Any files you would like to retain for long term storage in the workspace bucket instead of the persistent disk.
Move non-analysis files to the 'shared' folder: This includes datasets, reports, images, and any other files you want to persist or share. Keep analysis files (.SAS, .R, .RMD) in their default locations: Moving these files to the 'shared' folder will remove them from the Analysis tab. |
Note: if you move any files listed under the ‘Analysis’ tab of the workspace (.SAS, .R, .Rmd) to the ‘/shared’ folder, it will no longer appear in the ‘Analysis’ tab user interface (UI). Only files in the ‘$WORKSPACE_BUCKET/notebooks/’ directory will show up in the Analysis tab.
Note: files saved under the ‘shared’ folder will not copy to a duplicated workspace.
Using shared folder in RStudio
The 'shared' folder is available in the file explorer within RStudio. You can create, move, and manage files directly through the UI or using the terminal.
- PD location: /home/Rstudio/
- gcsfuse folder: /home/Rstudio/shared
View & move files in ‘shared’ folder
To view and move contents of the shared folder in RStudio, us the file browser or move files from RStudio terminal.
From the UI, select a file, click ‘More’ and then ‘Move’, then select the ‘shared’ folder.
To move files using RStudio terminal:
- View ‘shared’ folder from terminal: ls/home/rstudio/shared
- Move file from PD to ‘shared’ folder: cp /home/rstudio/<filename> /home/rstudio/shared
- Move file from ‘shared’ folder to PD: gsutil cp /home/rstudio/shared/<filename> /home/rstudio
Using shared folder in SAS Studio
The 'shared' folder is accessible in the file explorer within SAS Studio. You can create, move, and manage files directly through the UI or using the terminal.
- PD location: /data/
- gcsfuse folder: /data/shared
View & move files in ‘shared’ folder
To view and move contents of the shared folder in the SAS, you can use file browser in the Explorer tab or from within a SAS program.
From the Explorer tab, right click a file name to move, select ‘move to’ and select the ‘shared’ folder as the destination.
From within a SAS program, you can view, move and copy files from the shared folder.
View files in ‘shared’ folder:
filename foo pipe "ls /data/shared";
data null;
infile foo ;
input ;
put _infile_ ;
run;
Move files from PD to ‘shared’ folder:
filename foo pipe "mv /data/<filename> /data/shared";
data null;
infile foo ;
input ;
put _infile_ ;
Run;
Move files from ‘shared’ folder to PD:
filename foo pipe "mv /data/shared/<filename> /data";
data null;
infile foo ;
input ;
put _infile_ ;
run;
Comments
0 comments
Article is closed for comments.