Download the CDRv8 Data Characterization Report for the full report.
The Curated Data Repository (CDR) version 8 Data Characterization Report provides a high-level summary and details on how to contextualize, characterize, and appropriately leverage CDRv8.
Data from 633,547 participants are now available in CDRv8, increasing the number of participants with data available by 53% compared to CDRv7.
Additional data are now available, including:
- ~752% increase in the number of participants with structural variant data
- ~278% increase in the number of participants with Fitbit data
- ~170% increase in the number of participants with long-read whole genome sequences
- ~69% increase in the number of participants with short-read whole genome sequences
- ~53% increase in the number of participants with survey data
- ~50% increase in the number of participants with physical measurements data
- ~42% increase in the number of participants with genotyping arrays data
- ~37% increase in the number of participants with electronic health record (EHR) data
New data are available in CDRv8, including:
- >63,000 Life Functioning survey responses
- >40,000 Fitbit records from the WEAR Study
- Data from participants that self-identify as American Indian and/or Alaska Native
When using the All of Us dataset, researchers should consider data completeness, sources, and generalizations. For more information, refer to the current Data Dictionary, the full CDRv8 Data Characterization Report, and the data curation process.
This report was written by Hiral Master, Aymone Kouame, Hunter Hollis, Kayla Marginean, and Justin Cook on behalf of the All of Us Data and Research Center and the National Institutes of Health.
Comments
0 comments
Please sign in to leave a comment.