The Curated Data Repository (CDR) version 9 Data Characterization Report provides a high-level summary and details on how to contextualize, characterize, and appropriately leverage CDRv9.
Data from 747,028 participants, including 31,798 self-identified American Indian/Alaska Native (AI/AN) participants, is now available in CDRv9 Controlled Tier. This is 17.91% increase compared to the CDRv8 Controlled Tier release in February 2025 (N= 633,547).
Additional data are now available, including:
- ~16% increase in the number of participants with Fitbit data
- ~17 % increase in the number of participants with survey data
- ~18% increase in the number of participants with physical measurements data
- ~22% increase in the number of participants with electronic health record (EHR) data
- ~23% increase in the number of participants with genomics data available
- >97,700 participants with any of the four Exploring the Minds (EtM) tasks
New data are available in CDRv8, including:
- >9,900 participants which have Proteomics
- >8,900 participants with participants which have RNA-Seq
- >99,000 participants with clinical notes data
When using the All of Us dataset, researchers should consider data completeness, sources, and generalizations. For more information, refer to the current Data Dictionary, the full CDRv9 Data Characterization Report, and the data curation process.
This report was written by the All of Us Data and Research Center and the National Institutes of Health.
Comments
0 comments
Article is closed for comments.