In June 2022, the All of Us Research Program released read alignment data in CRAM format for 98,590 whole genome sequencing (WGS) samples. The Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data. It allows researchers to visualize all the common types of genomic data and metadata [1]. This article shows how to use IGV to browse CRAM files in the All of Us Researcher Workbench.
There are two steps to use the IGV in the All of Us Researcher Workbench:
- Copy a CRAM locally
- Render IGV browser and load a track
Copy the CRAM locally
The CRAMs and their corresponding index files are stored in the Google Bucket with the path gs://fc-aou-datasets-controlled/pooled/wgs/cram/v6_base/*.cram. While IGV supports direct access to Google Cloud Storage, this feature unfortunately cannot currently be used in the Workbench due to data exfiltration controls. The recommended approach is to first download the CRAM file(s) to your analysis VM. We provide CRAM files and CRAM index files with the research ID in the name of the file. One CRAM file for each WGS sample. See CRAM files section in the support article “How the All of Us Genomic data are organized” for more information.
We will use the “gsutil” command to copy one CRAM file and its corresponding index file into the current environment for downstream analysis. We copy the CRAM file (.cram) and its index (.cram.crai) of the first sample into the current working directory.
!gsutil -u $GOOGLE_PROJECT cp gs://fc-aou-datasets-controlled/pooled/wgs/cram/v6_base/wgs_1000004* . |
Note that CRAMs are typically 15-20GB each, which may take around 9 min to copy to the virtual machine (VM), and requires sufficient local disk space. Please make sure the Disk is large enough to host the CRAMs.
Render IGV browser and load a track
After the tools are installed and files needed are copied, we can use the following code to import igv and load track.
import igv |
This will display the Viewer. You can select a chromosome, change region, zoom in and zoom out using your mouse in the Viewer to see features.
There are also some functions to render the Viewer [2]. Below is the example searching by region: b.search('chr1:30000-40000')
and searching by gene name: b.search('myc')
There are more useful functions like b.zoom_in() and b.zoom_out(). For a more detailed example on how to use IGV in All of Us Researcher Workbench to browse the CRAM files, please check the tutorial workspace “CRAM_Processing CT”.
[1] Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013 Mar;14(2):178-92. doi: 10.1093/bib/bbs017. Epub 2012 Apr 19. PMID: 22517427; PMCID: PMC3603213.
[2] https://github.com/igvteam/igv-notebook
Comments
0 comments
Article is closed for comments.