Data Types and Organization

  • Updated

The All of Us Research Program collects different types of data from participants across the United States and its territories, including data collected by the program (e.g., demographics, surveys, and physical measurements), data from electronic health records (EHRs), data from wearable devices, and data from biospecimens (e.g., genomics). Read the All of Us Research Program Protocol for additional information.

All the data collected by the All of Us Research Program undergo the data curation and transformation process defined by the program.

Data Organization

All of Us data are organized into tables according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) version 5.3, when possible and into custom formats. Read more about the OMOP CDM relational database.

Participant provided information via surveys, physical measurements, and electronic health record (EHR) data are arranged into tables according to the OMOP CDM convention.

Self-reported demographic data from The Basics survey populates in the person table. These same demographic data as well as other data obtained from surveys are found in the observation table. See survey codebooks for additional information about All of Us surveys.

Program physical measurements, as well as EHR measurements, populate in the measurement table. EHR data related to visits, procedures, drugs, and conditions are arranged into their respective tables. All tables relate to the person table, and the tables containing procedure, drug, condition, and measurement data relate to the visit table.

Demographics

Demographic data include age, gender, race, ethnicity, self-reported population, and deceased status. Demographic data are self-reported by participants via surveys in the Registered Tier and Controlled Tier and electronic health records in the Controlled Tier.

Surveys

Survey data include responses from surveys completed by All of Us Research Program participants. When joining the program, participants complete three core surveys (The Basics, Lifestyle, and Overall Health). Additional surveys are available following the completion of the core surveys, including Personal and Family Health History, Health Care Access and Utilization, and Social Determinants of Health.

All of the survey questions and potential answers are added to the All of Us specific “PPI” vocabulary and assigned a concept_id (source concept ID). When possible, the PPI concepts are then mapped to standard vocabularies such as Logical Observation Identifiers Names and Codes (LOINC), International Classification of Diseases (ICD), or Systematized Nomenclature of Medicine (SNOMED), and their associated “standard” concept_id. When mapping is not possible, the PPI concept_id serves as both the source and standard.

For more information on how to use survey data in your research, see our Introduction to Survey Collection and Data Transformation Methods article and the How to Work with All of Us Survey Data notebooks in the Researcher Workbench tutorial workspaces.

Physical measurements

Physical measurement data include blood pressure, heart rate, height, weight, body mass index (BMI), waist and hip circumference, pregnancy status, and wheelchair use. Physical Measurements are available from program enrollment visits, the EHR, as well as self-reported surveys.

Physical measurement concepts are stored in the OMOP Measurement table; program collected measurement concepts are assigned an All of Us specific PPI vocabulary concept_id (source concept ID) and then mapped to the standard LOINC vocabulary and the associated “standard” concept_id. We recommend using the program-collected measurement data when possible. For more information on how to use physical measurement data in your research, see our Introduction to All of Us Physical Measurement Data Collection and Transformation Methods article the How to Work with All of Us Physical Measurement Data Featured Workspace.

Electronic Health Records (EHRs)

Electronic health record (EHR) data include information about visits, procedures, drugs, and conditions from a participant’s health care provider. EHR data are collected when a participant agrees to share their EHR data with the All of Us Research Program. EHR data are transformed into standard vocabulary across 14+ structured tables. To learn more about EHR, see our Introduction to All of Us Electronic Health Record (EHR) Collection and Data Transformation Methods and Data Dictionary for Curated Data Repository (CDR) for detailed description of all metadata and tables.

Wearables

Wearables data are currently available for Fitbit devices and include the type of Fitbit device, heart rate by zone summary, daily summary activity, daily sleep summary, and more. Wearables are collected when a participant agrees to share their Fitbit data or participates in the WEAR Study. Wearable device data is not structured according to the OMOP CDM, but instead in 7 All of Us specific tables according to domain and minute/summary level data: steps_intraday, heart_rate_summary, activity_summary, heart_rate_minute_level, sleep_level, sleep_daily_summary and device tables. To learn more about Fitbit data, see Overview and resources to aid using Fitbit data available on All of Us Researcher Workbench.

Genomics

Genomic data are collected when participants agree to contribute biosamples to be used for a variety of bioassays. DNA extracted from samples are sent to the Genome Center for analysis, including whole genome sequencing and genome-wide genotyping. Genomic data are only available in the Controlled Tier. A summary of the genomic data formats and what information is available in each data type can be found in the support article How the All of Us Genomic data are organized.

Additional data types

Socioeconomic status

A selection of externally sourced socioeconomic status summary statistics, which are based on 3-digit zip codes from the U.S. Census American Community Survey, are available in the Controlled Tier.

Summary statistics include

  • Proportion of population receiving assisted income benefits within the past 12 months
  • Proportion of population aged 25 years or older with educational attainment of at least high school or GED equivalent
  • Median household income in the past 12 months
  • Proportion of the population with no health insurance coverage
  • Proportion of population with income below the federal poverty level within the past 12 months
  • Proportion of houses that are vacant
  • Nationwide Community Deprivation Index

Note: Participant-level data for educational attainment, household income, and health insurance coverage are also available via The Basics survey.

COVID-19 research studies

In response to the coronavirus disease 2019 (COVID-19) pandemic, the All of Us Research Program launched several research initiatives including the All of Us SARS-CoV-2 Antibody Study.

The antibody study data are no longer available for new research and are only accessible in the All of Us Researcher Workbench via a past curated data repository (CDR) for study replication. For information about replicating the COVID-19 antibody study, see the “How to Reproduce the All of Us SARS-CoV-2 Antibody Study” featured workspace (@researchallofus.org login required).

Note: If you intended to replicate the study, keep in mind that not all positive controls used in study analyses were included in the datasets due to data use restrictions set by the sample provider and race and ethnicity are combined categories within the paper.

Next articles

Data Curation Process

Understand how the data are collected and curated for the All of Us Researcher Workbench

Participant Privacy Protections

Explore how the All of Us Research Program protects participant privacy within the All of Us Researcher Workbench

Was this article helpful?

10 out of 10 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.