How to select the right data access tier and curated data repository (CDR) for a workspace

  • Updated

Selecting the right data access tier and CDR for a workspace

Security and participant privacy are of the highest importance to the All of Us Research Program, which is why the All of Us Data and Research Center utilizes a tiered system for assessing participant data. 

The Public Tier dataset contains aggregate data. These data are available to everyone through Data Snapshots and the Data Browser, an interactive tool on the Research Hub.

The Research Hub is also where researchers can gain access to more granular and row-level participant data for research purposes by applying for access to the Researcher Workbench. Once researchers finish the application process and create an account, they may access the Researcher Workbench, where the Registered Tier and Controlled Tier datasets are housed. 

All registered researchers will have access to the Registered Tier dataset within the Researcher Workbench. The Registered Tier curated dataset contains individual-level data and currently includes data from electronic health records (EHRs) and wearables, surveys, and physical measurements taken at the time of participant enrollment. These data have been altered to protect participant privacy (see How All of Us protects participant privacy for additional information). 

The Controlled Tier requires a registered researcher to complete additional steps for access and for their institution to sign an amended Data Use and Registration Agreement (if they have not already). These additional steps are required as the Controlled Tier dataset includes genomic data, additional demographic information from EHRs, and survey answers that are suppressed or generalized in the Registered Tier, and therefore, need additional security in place to protect participant privacy. See this article to learn how to access to the Controlled Tier data.

The table below highlights the differences between the Registered and Controlled Tiers, regarding participant privacy methodology applied to data during the curation process, and can help you decide which tier is most appropriate for the type of research you are wanting to conduct. Data elements marked “As Collected” means these data are not altered during curation. See How All of Us protects participant privacy and the Participant Privacy Protection articles for additional information. 

Data Element

Registered Tier

Controlled Tier 

Explicit identifiers 

Suppress

Suppress 

Free text fields in surveys and unstructured clinical documents

Suppress

Suppress

Dates (of events)

Random shift 

Backward by a random number between 1 to 365

As Collected (unshifted)

Date of Birth 

Random shift

Backward by a random number between 1 to 365

Generalize to year of birth

Date of Death

Random shift

Backward by a random number between 1 to 365

As Collected (unshifted)

Data of participants age >89

Suppress

As Collected 

Geolocation

Generalize to US state

Generalize to first 3 digits of zip code

Marital status

As Collected 

As Collected 

Living situation 

PPI (survey): Where are you currently living?

Suppress

As Collected 

Own or rent

As Collected 

As Collected  

Higher level Race/Ethnicity 

Eg: Asian, White, Black, MENA etc

Generalize

As Collected 

Race/Ethnicity subcategory 

Eg: Hmong, Fillipino, Caribbean

Suppress

Suppress

Sex at birth (PPI)*

Generalize

As Collected *

Includes all branching logic questions

Gender identity (PPI) 

Generalize

As Collected *

Includes all branching logic questions

Sexual orientation (PPI) 

Generalize

As Collected *

Includes all branching logic questions

Race and Ethnicity (EHR)

 

Suppress

Value from EHR is suppressed to harmonize with PPI data

As Collected 

Sex/Gender (EHR) 

Suppress

Value from EHR is suppressed to harmonize with PPI data

As Collected 

ICD codes indicative of suppressed sex/gender

List of codes here

Suppress

As Collected 

Education

Generalize

As Collected 

Employment status

Generalize

As Collected 

Annual household income

As Collected 

As Collected 

Death cause 

i.e., Death cause noted in the EHR, including relevant diagnosis codes

Suppress

As Collected 

Diagnosis codes subject to public knowledge 

List of codes here

Suppress

As Collected 

ICD Codes indicative of motor vehicle accidents

ICD9 E80*-E84*, ICD10 V*

Suppress

Suppress

Active duty military status

Suppress

As Collected 

Born in US or not

As Collected 

As Collected *

Genomic data 

Includes program-generated whole genome sequencing and Array data

Suppress

As Collected 

Note: ‘As Collected’ indicates that there will be no change to the data for the purpose of privacy protection 

*Free text responses will be suppressed. 

When you create a new workspace, you will be asked to select which Data access tier you would like to access for that workspace (i.e., Controlled Tier or Registered Tier). 

Note: Workspaces created using Controlled Tier data can only be shared with other users who have access to the Controlled Tier dataset. 

CDR_Selection.png

After you select the tier you would like to access, you will then select a Dataset version (ie. CDR version). Per the Data User Code of Conduct, new workspaces should select the most current dataset available for that access tier, unless you are attempting to replicate a previous study.

CDR_Selection2.png

 

Was this article helpful?

3 out of 3 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.