All of Us Genomic Quality Report (Archived C2024Q3R9 - v8)

  • Updated

On February 3, 2025, the All of Us Research Program released the genomic data of 447,278 array samples, 414,830 srWGS samples with SNP & Indels, 97,061 srWGS samples with SV calls, and 2,800 lrWGS samples in the Researcher Workbench (RW) for use by researchers registered for Controlled Tier access. As described previously, this high-quality genetic data along with comprehensive health data will enable health research and catalog the genetic variation that leads to human health and disease. For a snapshot of the data, see Table 1.

DatasetNumber of ParticipantsNumber of VariantsHighlights
Array447,278More than 1.8 million
  • We now have Array data from participants that self-identify as American Indian or Alaska Native
Short-read WGS SNP and Indel414,830More than 1.2 billion
  • We added ~150,000 new participants in CDRv8 which resulted in over 200 million new variants
  • We now have srWGS data from participants that self-identify as American Indian or Alaska Native
Short-read WGS structural variants (SVs)97,061Nearly 1.5 million 
Long-read WGS2,80011 cohorts with SNPs, Indels, and structural variants
  • Variants are called to both the grch38_noalt and T2Tv2.0 references

In addition to variant calls, raw data (IDAT files for array data, CRAM files for srWGS data, BAM files for lrWGS data) and auxiliary files (variant annotations, pharmacogenomics, genetic ancestry categories, genetic ancestry admixture estimates, and relatedness/kinship scores) are available in the RW through Controlled Tier access. Quality control processes, performed both independently and across samples, indicate that these data are ready for general analysis.  We suggest researchers, at a minimum, read the 'Known Issues' and 'FAQ; sections of the All of Us Genomics QC report before using the data.

 

Was this article helpful?

8 out of 9 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.