Race and Ethnicity Data Collection and Transformation

  • Updated

Updated to reflect new changes introduced with release of CDR version 8

All of Us collects self-reported race and ethnicity data from participants via the Basics survey, which is administered at enrollment. For more information about the Basics survey design, see here. The survey itself is available for viewing here.

Race and ethnicity data are captured via the single multi-select item, “Which categories describe you? Select all that apply. Note, you may select more than one group.” Response options include White; Black, African American or African; Asian; Native American or Alaska Native;* Middle Eastern or North African; Native Hawaiian or other Pacific Islander; Hispanic, Latino, or Spanish; None of these fully describe me; and Prefer not to answer. Data for a follow up item are collected for each of the response options selected and those data are only available for research in the Controlled Tier.

*Participant records for individuals who selected this response option are only available in CDR version 8 forward and only in the Controlled Tier dataset. 

Data Collection and Transformation Methods

As is the case with all survey items, source codes are deposited as concepts in the OMOP Common Data Model’s All of Us specific “PPI” vocabulary and stored in the observation table. Note that alongside these survey concepts, demographic concepts sourced from the EHR are also stored in the observation table. Those corresponding EHR data are only available in the Controlled Tier.

While these survey concepts are stored in the observation table, they are also used to populate the OMOP Person table, which contains a unique record for each participant whose data is available in the CDR. The records are used to centralize the storage of demographic data about the participants. For information about the individual fields stored in the Person table, see the CDR data dictionaries. The Person table data are used to populate the demographic selection criteria available via the Researcher Workbench Cohort Builder tool but can also be directly extracted from the CDR itself.

Considerations for Data Organization within the OMOP Person Table

By default, the OMOP Person Table stores race and ethnicity data in separate columns. Beginning with CDR version 8, All of Us introduced a new custom column to the table labeled, “Self-reported Category” that’s meant to surface the data in a way that’s more closely aligned with how it’s collected. To learn more, please see CDR version 8 release notes. It’s also important to note that, while the Basics survey item is a multi-select (e.g., participants may select more than one response option), the Person Table stores only one response row per participant. In this way, the data has been generalized to accommodate the Person Table structure. Additionally, select response options are generalized in the Registered Tier to protect participant privacy. The table below highlights these considerations in more detail. Note: While CDRv8 retains OMOP race and ethnicity columns, future CDRs will deprecate these columns. You should use the “self-reported category” column when possible.

All of Us Utilization of the OMOP Person Table for Organizing Race and Ethnicity Data
PARTICIPANT SELECTION(S) Self-Reported Category column

RACE column

ETHNICITY column
Only White selected White White Non-Hispanic
Only Black, African American, or African selected Black Black Non-Hispanic
Only Asian selected Asian Asian Non-Hispanic

One but NOT multiple of the options selected:

  • Middle Eastern or North African
  • Native Hawaiian or other Pacific Islander
  • American Indian or Alaska Native

Registered Tier (RT): Another single population


Controlled Tier (CT): As collected

RT: Another single population


CT: As collected

Non-Hispanic

Only Hispanic, Latino, or Spanish selected

Hispanic, Latino, or Spanish No matching concept Hispanic, Latino, or Spanish

Two or more of any of the following selected:

  • White
  • Black, African American or African
  • Asian
  • Middle Eastern or North African
  • Native Hawaiian or other Pacific Islander
  • American Indian or Alaska Native
More than one population More than one population Non-Hispanic
White and Hispanic, Latino, or Spanish selected More than one population White Hispanic, Latino, or Spanish
Black and Hispanic, Latino, or Spanish selected More than one population Black Hispanic, Latino, or Spanish
Asian and Hispanic, Latino, or Spanish selected More than one population Asian Hispanic, Latino, or Spanish

One but NOT multiple of the options:

  • Middle Eastern or North African
  • Native Hawaiian or other Pacific Islander
  • American Indian or Alaska Native

 

AND

  • Hispanic, Latino, or Spanish selected

Registered Tier (RT): More than one population


Controlled Tier (CT): As collected

RT: Another single population


CT: As collected

 

Hispanic, Latino, or Spanish

Two or more of any of the following selected:

  • White
  • Black, African American, or African
  • Asian
  • Middle Eastern or North African
  • Native Hawaiian or other Pacific Islander

AND

  • Hispanic, Latino, or Spanish selected
More than one population More than one population Hispanic, Latino, or Spanish
None of these fully describe me None of these fully describe me None of these fully describe me None of these fully describe me
Prefer not to answer PreferNotToAnswer PreferNotToAnswer PreferNotToAnswer
Skip Skip Skip Skip
[Survey not completed by participant] No matching concept No matching concept No matching concept

Please note that when cross-comparing participant counts across these columns, discrepant counts will be observed due to how these data are organized with regards to the selection of multiple response options. For example, if a participant selected the response options, “White” and “Hispanic” for the Basics survey item, there would be a count of 1 for Race and a count of 1 for Ethnicity. In the Self-Reported Category column, this participant’s response is generalized to “More than one population” and rather than counted as either White or Hispanic separately.

The table below provides a visual example of how the above Person Table column rules are applied to individual participant records.

Application of All of Us OMOP Person Table Utilization for Individual Participant Records
  Person_id (PID) Self-Reported Population (PPI) Race Ethnicity
Example

12345647

PPI Population w/Ethnicity Information Answer codes from 1586140 Accepted ethnicity codes
Test Participant 1 39743209 White White Non-Hispanic
Test Participant 2 87942415 More than one population More than one population Hispanic
Test Participant 3 71451748 Native Hawaiian or other Pacific Islander Native Hawaiian or other Pacific Islander Non-Hispanic
Test Participant 4 5461687 Hispanic, Latino, or Spanish No matching concept Hispanic
Test Participant 5 8948463 More than one population White Hispanic

 

 

 

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.