Updated to reflect new changes introduced with release of CDR version 8.
All of Us collects self-reported race and ethnicity data from participants via the Basics survey, which is administered at enrollment. For more information about the Basics survey design, see here. The survey itself is available for viewing here.
Race and ethnicity data are captured via the single multi-select item, “Which categories describe you? Select all that apply. Note, you may select more than one group.” Response options include White; Black, African American or African; Asian; Native American or Alaska Native;* Middle Eastern or North African; Native Hawaiian or other Pacific Islander; Hispanic, Latino, or Spanish; None of these fully describe me; and Prefer not to answer. Data for a follow up item are collected for each of the response options selected and those data are only available for research in the Controlled Tier.
*Participant records for individuals who selected this response option are only available in CDR version 8 forward and only in the Controlled Tier dataset.
Data Collection and Transformation Methods
As is the case with all survey items, source codes are deposited as concepts in the OMOP Common Data Model’s All of Us specific “PPI” vocabulary and stored in the observation table. Note that alongside these survey concepts, demographic concepts sourced from the EHR are also stored in the observation table. Those corresponding EHR data are only available in the Controlled Tier.
While these survey concepts are stored in the observation table, they are also used to populate the OMOP Person table, which contains a unique record for each participant whose data is available in the CDR. The records are used to centralize the storage of demographic data about the participants. For information about the individual fields stored in the Person table, see the CDR data dictionaries. The Person table data are used to populate the demographic selection criteria available via the Researcher Workbench Cohort Builder tool but can also be directly extracted from the CDR itself.
Considerations for Data Organization within the OMOP Person Table
By default, the OMOP Person Table stores race and ethnicity data in separate columns. Beginning with CDR version 8, All of Us introduced a new custom column to the table labeled, “Self-reported Category” that’s meant to surface the data in a way that’s more closely aligned with how it’s collected. To learn more, please see CDR version 8 release notes. It’s also important to note that, while the Basics survey item is a multi-select (e.g., participants may select more than one response option), the Person Table stores only one response row per participant. In this way, the data has been generalized to accommodate the Person Table structure. Additionally, select response options are generalized in the Registered Tier to protect participant privacy. The table below highlights these considerations in more detail. Note: While CDRv8 retains OMOP race and ethnicity columns, future CDRs will deprecate these columns. You should use the “self-reported category” column when possible.
All of Us Utilization of the OMOP Person Table for Organizing Race and Ethnicity Data | |||
PARTICIPANT SELECTION(S) | Self-Reported Category column |
RACE column |
ETHNICITY column |
Only White selected | White | White | Non-Hispanic |
Only Black, African American, or African selected | Black | Black | Non-Hispanic |
Only Asian selected | Asian | Asian | Non-Hispanic |
One but NOT multiple of the options selected:
|
Registered Tier (RT): Another single population
|
RT: Another single population
|
Non-Hispanic |
Only Hispanic, Latino, or Spanish selected |
Hispanic, Latino, or Spanish | No matching concept | Hispanic, Latino, or Spanish |
Two or more of any of the following selected:
|
More than one population | More than one population | Non-Hispanic |
White and Hispanic, Latino, or Spanish selected | More than one population | White | Hispanic, Latino, or Spanish |
Black and Hispanic, Latino, or Spanish selected | More than one population | Black | Hispanic, Latino, or Spanish |
Asian and Hispanic, Latino, or Spanish selected | More than one population | Asian | Hispanic, Latino, or Spanish |
One but NOT multiple of the options:
AND
|
Registered Tier (RT): More than one population
|
RT: Another single population
|
Hispanic, Latino, or Spanish |
Two or more of any of the following selected:
AND
|
More than one population | More than one population | Hispanic, Latino, or Spanish |
None of these fully describe me | None of these fully describe me | None of these fully describe me | None of these fully describe me |
Prefer not to answer | PreferNotToAnswer | PreferNotToAnswer | PreferNotToAnswer |
Skip | Skip | Skip | Skip |
[Survey not completed by participant] | No matching concept | No matching concept | No matching concept |
Please note that when cross-comparing participant counts across these columns, discrepant counts will be observed due to how these data are organized with regards to the selection of multiple response options. For example, if a participant selected the response options, “White” and “Hispanic” for the Basics survey item, there would be a count of 1 for Race and a count of 1 for Ethnicity. In the Self-Reported Category column, this participant’s response is generalized to “More than one population” and rather than counted as either White or Hispanic separately.
The table below provides a visual example of how the above Person Table column rules are applied to individual participant records.
Application of All of Us OMOP Person Table Utilization for Individual Participant Records | ||||
Person_id (PID) | Self-Reported Population (PPI) | Race | Ethnicity | |
Example |
12345647 |
PPI Population w/Ethnicity Information | Answer codes from 1586140 | Accepted ethnicity codes |
Test Participant 1 | 39743209 | White | White | Non-Hispanic |
Test Participant 2 | 87942415 | More than one population | More than one population | Hispanic |
Test Participant 3 | 71451748 | Native Hawaiian or other Pacific Islander | Native Hawaiian or other Pacific Islander | Non-Hispanic |
Test Participant 4 | 5461687 | Hispanic, Latino, or Spanish | No matching concept | Hispanic |
Test Participant 5 | 8948463 | More than one population | White | Hispanic |
Comments
0 comments
Article is closed for comments.