CDRv8 Data Characterization Report

  • Updated

Download the CDRv8 Data Characterization Report for the full report.

The Curated Data Repository (CDR) version 8 Data Characterization Report provides a high-level summary and details on how to contextualize, characterize, and appropriately leverage CDRv8.

Data from 633,547 participants are now available in CDRv8, increasing the number of participants with data available by 53% compared to CDRv7.

Additional data are now available, including:

  • ~752% increase in the number of participants with structural variant data
  • ~278% increase in the number of participants with Fitbit data
  • ~170% increase in the number of participants with long-read whole genome sequences
  • ~69% increase in the number of participants with short-read whole genome sequences
  • ~53% increase in the number of participants with survey data
  • ~50% increase in the number of participants with physical measurements data
  • ~42% increase in the number of participants with genotyping arrays data
  • ~37% increase in the number of participants with electronic health record (EHR) data

New data are available in CDRv8, including:

  • >63,000 Life Functioning survey responses
  • >40,000 Fitbit records from the WEAR Study
  • Data from participants that self-identify as American Indian and/or Alaska Native

When using the All of Us dataset, researchers should consider data completeness, sources, and generalizations. For more information, refer to the current Data Dictionary, the full CDRv8 Data Characterization Report, and the data curation process.

This report was written by Hiral Master, Aymone Kouame, Hunter Hollis, Kayla Marginean, and Justin Cook on behalf of the All of Us Data and Research Center and the National Institutes of Health.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.