What reference are the variants called against for the genomic data?

  • Updated

The array and short read whole genome sequencing (srWGS) variants are called against the hg38/GRCh38 reference. Below are locations of the public reference and auxiliary files:


The long read whole genome sequencing (lrWGS) variants are called against two references. grch38_noalt corresponds to the GRCh38 reference with no alternate sequences. T2Tv2.0 corresponds to the T2T-CHM13v2.0 reference, with the EBV contig added from the grch28_noalt reference.

  Array srWGS SNP & Indel srWGS SVs lrWGS
Reference version

hg38/GRCh38 reference

Note: variants are called originally with hg19 reference but they are lifted over before release on the researcher workbench

hg38/GRCh38 reference hg38/GRCh38 reference




For more detailed information regarding the the genomic data, we recommend the following articles:

  1. How the All of Us Genomic data are organized
  2. All of Us Genomic Quality Report



Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request



Article is closed for comments.