The All of Us genomic and multi-omic data includes short read whole genome sequencing (srWGS) data, long read whole genome sequencing (lrWGS) data, microarray genotyping array (“array”) data, RNA sequencing (“RNA-seq”) data, and proteomics sequencing data. Researchers access these data through the Researcher Workbench (RW) Controlled Tier dataset (e.g. genomic data is not available through the Registered Tier). Bucket locations for accessing the data in analysis notebooks can be found in the Data Dictionary.
A summary of the file formats for each data type can be viewed in the list below.
In this article, we will summarize the genomic data formats and what information is available in each data type. In some cases, we will refer to other documentation when it describes the data format we deliver. This article assumes a general knowledge of genomics and bioinformatics. For a workspace on getting started with genomic data on the Researcher Workbench, please see the “How to Work with All of Us Genomic Data” notebook in the All of Us Controlled Tier Featured Workspace. We also provide a detailed report on the quality of the genomic data with each release in the All of Us Genomic Data Quality Report available on the User Support Hub.
Comments
0 comments
Article is closed for comments.