The document attached details the All of Us Genome Centers (GC) and Data and Research Center (DRC) quality control (QC) steps for the genomic data in the research pipeline. This pipeline removes or flags samples and variants in the genomic data that fail quality thresholds. We apply the pipeline before we release the genomic data for research use. We, the All Of Us Data and Research Center (DRC), only describe QC processes that are performed analytically (i.e., after the sample has been genotyped and sequenced). All descriptions and results are limited to the Beta Release (“Beta”) made available in the Researcher Workbench March 15, 2022, which contains 165,208 array samples and 98,622 whole genome sequencing (WGS) samples. The samples in the genomic data correspond to the All of Us Curated Data Repository (CDR) release C2021Q3R6. These pipelines are automated unless otherwise noted. This document covers all genomic datatypes made available to researchers at this time including small variants (SNPs and Indels) for arrays and short-read WGS.
All of Us Beta Release Genomic Quality Report (ARCHIVED C2021Q3R6 CDR CT Dataset v5)
Was this article helpful?0 out of 0 found this helpful
Have more questions? Submit a request