Selecting the right data access tier and CDR for a workspace
Security and participant privacy are of the highest importance to the All of Us Research Program, which is why the All of Us Data and Research Center utilizes a tiered system for assessing participant data.
The Public Tier dataset contains aggregate data. These data are available to everyone through Data Snapshots and the Data Browser, an interactive tool on the Research Hub.
The Research Hub is also where researchers can gain access to more granular and row-level participant data for research purposes by applying for access to the Researcher Workbench. Once researchers finish the application process and create an account, they may access the Researcher Workbench, where the Registered Tier and Controlled Tier datasets are housed.
All registered researchers will have access to the Registered Tier dataset within the Researcher Workbench. The Registered Tier curated dataset contains individual-level data and currently includes data from electronic health records (EHRs) and wearables, surveys, and physical measurements taken at the time of participant enrollment. These data have been altered to protect participant privacy (see How All of Us protects participant privacy for additional information).
The Controlled Tier requires a registered researcher to complete additional steps for access and for their institution to sign an amended Data Use and Registration Agreement (if they have not already). These additional steps are required as the Controlled Tier dataset includes genomic data, additional demographic information from EHRs, and survey answers that are suppressed or generalized in the Registered Tier, and therefore, need additional security in place to protect participant privacy. See this article to learn how to access to the Controlled Tier data.
The table below highlights the differences between the Registered and Controlled Tiers, regarding participant privacy methodology applied to data during the curation process, and can help you decide which tier is most appropriate for the type of research you are wanting to conduct. Data elements marked “As Collected” means these data are not altered during curation. See How All of Us protects participant privacy and the Participant Privacy Protection articles for additional information.
Data Element |
Registered Tier |
Controlled Tier |
Explicit identifiers |
Suppress |
Suppress |
Free text fields in surveys and unstructured clinical documents |
Suppress |
Suppress |
Dates (of events) |
Random shift Backward by a random number between 1 to 365 |
As Collected (unshifted) |
Date of Birth |
Random shift Backward by a random number between 1 to 365 |
Generalize to year of birth |
Date of Death |
Random shift Backward by a random number between 1 to 365 |
As Collected (unshifted) |
Data of participants age >89 |
Suppress |
As Collected |
Geolocation |
Generalize to US state |
Generalize to first 3 digits of zip code |
Marital status |
As Collected |
As Collected |
Living situation PPI (survey): Where are you currently living? |
Suppress |
As Collected |
Own or rent |
As Collected |
As Collected |
Higher level Race/Ethnicity Eg: Asian, White, Black, MENA etc |
Generalize |
As Collected |
Race/Ethnicity subcategory Eg: Hmong, Fillipino, Caribbean |
Suppress |
Suppress |
Sex at birth (PPI)* |
Generalize |
As Collected * Includes all branching logic questions |
Gender identity (PPI) |
Generalize |
As Collected * Includes all branching logic questions |
Sexual orientation (PPI) |
Generalize |
As Collected * Includes all branching logic questions |
Race and Ethnicity (EHR)
|
Suppress Value from EHR is suppressed to harmonize with PPI data |
As Collected |
Sex/Gender (EHR) |
Suppress Value from EHR is suppressed to harmonize with PPI data |
As Collected |
ICD codes indicative of suppressed sex/gender List of codes here |
Suppress |
As Collected |
Education |
Generalize |
As Collected |
Employment status |
Generalize |
As Collected |
Annual household income |
As Collected |
As Collected |
Death cause i.e., Death cause noted in the EHR, including relevant diagnosis codes |
Suppress |
As Collected |
Diagnosis codes subject to public knowledge List of codes here |
Suppress |
As Collected |
ICD Codes indicative of motor vehicle accidents ICD9 E80*-E84*, ICD10 V* |
Suppress |
Suppress |
Active duty military status |
Suppress |
As Collected |
Born in US or not |
As Collected |
As Collected * |
Genomic data Includes program-generated whole genome sequencing and Array data |
Suppress |
As Collected |
Note: ‘As Collected’ indicates that there will be no change to the data for the purpose of privacy protection
*Free text responses will be suppressed.
When you create a new workspace, you will be asked to select which Data access tier you would like to access for that workspace (i.e., Controlled Tier or Registered Tier).
Note: Workspaces created using Controlled Tier data can only be shared with other users who have access to the Controlled Tier dataset.
After you select the tier you would like to access, you will then select a Dataset version (ie. CDR version). Per the Data User Code of Conduct, new workspaces should select the most current dataset available for that access tier, unless you are attempting to replicate a previous study.
Comments
0 comments
Article is closed for comments.