Participant privacy is of utmost importance to the All of Us Research Program. Our privacy experts have created curation methods to remove all personally identifying information (PII) from participant records. In addition to PII, some data are subject to suppression (withholding or removing selected information) or generalization based on re-identification risk.
Researchers may access the All of Us data through three separate tiers:
- Public Tier data include only anonymized, aggregated data and are accessible to the public through the All of Us Research Hub's Data Browser.
- Registered Tier data include participant-level data with added transformations to protect participant privacy and are only available to registered researchers.
- Controlled Tier data include participant-level data with fewer transformations than those in the Registered Tier.
Please note: Some differences in participant counts between tiers (Public, Registered and Controlled Tier) is expected given the differences in privacy methodologies. For instance, data for participants aged 89 years or older are suppressed in the Registered tier but it is available as collected in the Controlled Tier. To learn more, please see Data Dictionaries for the Curated Data Repositories (CDRs).
Summary of privacy rules for the Registered Tier and Controlled Tier
Data Element |
Registered Tier |
Controlled Tier |
Explicit identifiers |
Suppressed |
Suppressed |
Free text fields in survey and unstructured clinical documents |
Suppressed |
Suppressed |
Dates of events |
Transformed by a shift backwards by a random number between 1 to 365 The shift is constant for each participant so that temporality of events is preserved |
Included |
Date of birth |
Transformed by a shift backwards by a random number between 1 to 365 The shift is constant for each participant so that temporality of events is preserved |
Generalized to year of birth |
Date of death |
Transformed by a shift backwards by a random number between 1 to 365 The shift is constant for each participant so that temporality of events is preserved |
Included |
Data for participants age >89 years old |
Suppressed |
Included |
Geolocation |
Generalized to US state |
Generalized to first 3 digits of zip code |
Marital status |
Included |
Included |
Living situation from survey question “Where are you currently living?” |
Suppressed |
Included |
Own or rent |
Included |
Included |
Higher level race and ethnicity (e.g., Asian, White, Black, MENA, etc.) |
Generalized |
Included |
Subcategory race and ethnicity (e.g., Hmong, Filipino, Caribbean, etc.) |
Suppressed |
Included |
Sex at birth from Participant Provided Information (PPI) |
Generalized |
Included* Includes all branching logic questions |
Gender identity from PPI |
Generalized |
Included* Includes all branching logic questions |
Sexual orientation from PPI |
Generalized |
Included* Includes all branching logic questions |
Race and ethnicity from EHR
|
Suppressed Value from EHR is suppressed to harmonize with PPI data. |
Included |
Sex and gender from EHR |
Suppressed Value from EHR is suppressed to harmonize with PPI data. |
Included |
ICD codes indicative of suppressed sex and gender |
Suppressed |
Included |
Education |
Generalized |
Included |
Employment status |
Generalized |
Included |
Annual household income |
Included |
Included |
Death cause (i.e., death cause noted in the EHR, including relevant diagnosis codes) |
Suppressed |
Included |
Diagnosis codes subject to public knowledge |
Suppressed |
Included |
ICD code (ICD9 E80*-E84*, ICD10 V*) indicative of motor vehicle accidents |
Suppressed |
Suppressed |
Active duty military status |
Suppressed |
Included |
Born in US or not |
Included |
Included* |
Genomic data (e.g., program-generated Whole Genome Sequencing and Array data) |
Suppressed |
Included |
*Free text responses are suppressed.
Comments
0 comments
Article is closed for comments.