The Researcher Workbench allows you to import your own data or codebase into your individual workspace for analysis, however, you will need to take certain precautions before importing the data to ensure appropriate use and to protect data privacy.
First you should make sure that you have the appropriate clearance/access to use the data and/or share it with your collaborators who have access to your workspace, as outlined in the Data User Code of Conduct (DUCC). The DUCC also states that you will need to remove any personally identifiable information (PII), protected health information (PHI), or identifiable private information (IPI) from your data BEFORE importing any files into your workspace.
Personal Identifying Information (PII) refers to information that can be used to distinguish or trace the identity of an individual (e.g., name, social security number, biometric records etc.) either alone, or when combined with other personal or identifying information that is linked or linkable to a specific individual. Protected Health Information (PHI) refers to individually identifiable health information that is transmitted by electronic media, maintained in electronic media, or transmitted or maintained in any other form or medium. Identifiable Private Information (IPI) refers to private information where the identity of an individual is or may readily be ascertained by the investigator or associated with the information. PII generally includes PHI and IPI.
Removal of PII from Data imported into your Workspace
PII broadly includes any information that can be used to trace the identity of an individual. Data elements may be considered PII due to various factors, such as information that is publicly known about individuals in the database.
The Health Insurance Portability and Accountability Act (HIPAA)’s Privacy Rule provides a broader guidance for “de-identifying” datasets for dissemination. The Privacy Rule recommends removing 18 specific data elements that could be used to identify an individual or their relatives within the dataset. These data elements include, but are not limited to: names, dates, addresses or geographic information smaller than the first three digits of the zip code, unique id numbers or codes such as social security numbers, medical record numbers, phone and fax numbers, biometric, photographs or comparable images etc. Datasets with these data elements removed are considered ‘de-identified’ by HIPAA, provided the dataset is not known to have any additional information that could identify individuals within the dataset.
To maximize protection of participant privacy, the All of Us Research Program has incorporated our own privacy methodology into our data curation processes. In the Registered tier data, we remove all explicit identifiers and apply additional measures, such as suppressing or generalizing additional variables considered quasi-identifiers based on re-identification risk. The privacy methodology applied for All of Us Registered Tier data is summarized below:
- All explicit identifiers that could be used to identify individuals within the dataset or their relatives are removed. These include:
- Names
- All unique IDs used for any purpose outside of the Researcher Workbench (eg: participant ID, social security number, medical record number, phone and fax numbers, etc.)
- IP addresses and URLs that could be linked to individuals
- All dates are shifted back by a random number between 1 and 365
- All free-text fields in surveys and full-text clinical notes removed
- All geo-location data smaller than US state except EHR site removed
- Demographic details such as race subcategories, gender identity, sexuality etc modified.
- Survey question on an individual’s living situation and active duty military status removed
- Active duty military status (PPI)
- Diagnosis codes specifying cause of death and other conditions that may be subject to public knowledge removed.
For additional information on the All of Us Research Program’s privacy methodology or to apply similar privacy protection principles to your data, see the resources listed below.
How All of Us protects participant privacy
Accessing geolocation data
Sex, gender, and sexual orientation generalizations
Education and employment generalizations
Race and ethnicity generalizations
State and site generalizations
Please review our Data User Code of Conduct for further information.
Comments
0 comments
Article is closed for comments.