Policy Questions

  • Updated

How do I cite the Researcher Workbench in my grants or publications?

We ask that all researchers using the Researcher Workbench honor the contribution of those who take part in All of Us to their research project’s work.

This includes in all oral and written presentations, disclosures, and publications resulting from any analyses of the data. The following are examples of acknowledgement and data availability statements.

Example acknowledgement statement
“We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data [and/or samples and/or cohort] examined in this study.”

Example data availability/data access statement
“This study used data from the All of Us Research Program’s [Registered/Controlled] Tier Dataset [version number], available to authorized users on the Researcher Workbench.”

Please review our All of Us Research Program Data and Statistics Dissemination Policy and All of Us Research Program Publication and Presentation Policy for further information.

How do I notify All of Us about my upcoming publication?

Researchers are required to notify the program of any publication or presentation using All of Us Research Program data at least 2 weeks before the date of publication or conference presentation. View the checklist about the reporting process.

You can notify All of Us about your upcoming publication via the "Contact Us" feature under the hamburger menu (Three Blue Line Icon.png) in the upper left hand corner of the Researcher Workbench homepage. You can also notify us directly through the Publication and Presentation Reporting Form.

The information you provide will be used by the All of Us Research Program for notification and communications planning purposes, without requirement for program review or approval. This information or any manuscript you submit will not be shared or disseminated outside the program until after it is published. You should submit an electronic version of the final, peer-reviewed manuscript to PubMed Central immediately upon acceptance for publication. To see detailed instructions on how to submit a manuscript without an embargo period, please visit the NIHMS Tutorials page, and click on Deposit Files, which contains an in-depth presentation with full screenshots of the process.

Researchers using the Researcher Workbench must also remember to honor the contribution of those who take part in All of Us to their research project’s work. This includes all oral and written presentations, disclosures, and publications resulting from any analyses of the data.

Please review our All of Us Research Program Data and Statistics Dissemination Policy, All of Us Research Program Publication and Presentation Policy, and Data User Code of Conduct for further information.

Does All of Us have intellectual property rights over products developed from data in the Researcher Workbench?

All of Us claims no intellectual property rights on products developed from research using All of Us data. All of Us supports and recommends that research products and services emerging from secondary research using All of Us data be accessible broadly and equitably.

What data or figures can I download in compliance with the Data User Code of Conduct?

As outlined in the Data User Code of Conduct, you cannot make copies of or download any participant-level data from the All of Us Researcher Workbench. Aggregate statistics that are more granular than buckets of 20 individuals may not be distributed or published without approval from the All of Us Research Program.

We highly recommend that any downloaded data table, chart, or figure should have summary counts of at least 20 so you don't later violate our Data and Statistics Dissemination Policy. For example, a count of 5 or 9 should be rounded up to 20; however, a count of 35 can stay as 35. This helps us protect participants from the risk of re-identification.

Please review our All of Us Research Program Data and Statistics Dissemination Policy, All of Us Research Program Publication and Presentation Policy, Data User Code of Conduct, and Egress Alert Policy for further information.

What do I need to do before importing external data into the Researcher Workbench to make sure I don’t violate any of the All of Us policies?

You can upload or import external data, codes, or files into your workspace, but it is important to remember that you are responsible for ensuring that you have the appropriate rights to anything you upload and that you have removed all of the personally identifiable information (PII) from any data or files before you upload them into your workspace. PII includes, but are not limited to: names, dates, addresses or geographic information smaller than the first three digits of the zip code, unique id numbers or codes such as Social Security Numbers, Medical Record Numbers, phone and fax numbers, biometric, photographs or comparable images, etc. When you share external data, codes, or files into your workspace, they will be available to you and other researchers collaborating on your workspace, but not generally available to other All of Us researchers.

Please note: By agreeing to the Data User Code of Conduct, you take full responsibility for any external data, files, or software that they import into the All of Us Researcher Workbench. It is your responsibility to only upload data you are authorized to use, in accordance with any data use restrictions in place, and to ensure that the collaborators of your workspace also follow these restrictions. You may import data into the Researcher Workbench as long as they comply with All of Us policies.

For additional guidance protecting participant privacy and on complying with the All of Us policies, read the “How do I comply with All of Us policies when importing data into the Researcher Workbench?” FAQ.

Please review our Data User Code of Conduct and Egress Alert Policy for further information.

How do I comply with All of Us policies when importing data into the Researcher Workbench?

The Researcher Workbench allows you to import your own data or codebase into your individual workspace for analysis, however, you will need to take certain precautions before importing the data to ensure appropriate use and to protect data privacy.

First you should make sure that you have the appropriate clearance/access to use the data and/or share it with your collaborators who have access to your workspace, as outlined in the Data User Code of Conduct (DUCC). The DUCC also states that you will need to remove any personally identifiable information (PII), protected health information (PHI), or identifiable private information (IPI) from your data BEFORE importing any files into your workspace.

Personal Identifying Information (PII) refers to information that can be used to distinguish or trace the identity of an individual (e.g., name, social security number, biometric records, etc.) either alone, or when combined with other personal or identifying information that is linked or linkable to a specific individual. Protected Health Information (PHI) refers to individually identifiable health information that is transmitted by electronic media, maintained in electronic media, or transmitted or maintained in any other form or medium. Identifiable Private Information (IPI) refers to private information where the identity of an individual is or may readily be ascertained by the investigator or associated with the information. PII generally includes PHI and IPI.

Removal of PII from data imported into your workspace
PII broadly includes any information that can be used to trace the identity of an individual. Data elements may be considered PII due to various factors, such as information that is publicly known about individuals in the database.

The Health Insurance Portability and Accountability Act (HIPAA)’s Privacy Rule provides a broader guidance for “de-identifying” datasets for dissemination. The Privacy Rule recommends removing 18 specific data elements that could be used to identify an individual or their relatives within the dataset. These data elements include, but are not limited to: names, dates, addresses or geographic information smaller than the first three digits of the zip code, unique id numbers or codes such as social security numbers, medical record numbers, phone and fax numbers, biometric, photographs or comparable images, etc. Datasets with these data elements removed are considered ‘de-identified’ by HIPAA, provided the dataset is not known to have any additional information that could identify individuals within the dataset.

To maximize protection of participant privacy, the All of Us Research Program has incorporated our own privacy methodology into our data curation processes. In the Registered Tier data, we remove all explicit identifiers and apply additional measures, such as suppressing or generalizing additional variables considered quasi-identifiers based on re-identification risk. The privacy methodology applied for All of Us Registered Tier data is summarized below:

  • All explicit identifiers that could be used to identify individuals within the dataset or their relatives are removed. These include:
    • Names
    • All unique IDs used for any purpose outside of the Researcher Workbench (eg: participant ID, social security number, medical record number, phone and fax numbers, etc.)
    • IP addresses and URLs that could be linked to individuals
    • All dates are shifted back by a random number between 1 and 365
    • All free-text fields in surveys and full-text clinical notes removed
    • All geo-location data smaller than US state except EHR site removed
    • Demographic details such as race subcategories, gender identity, sexuality etc modified
    • Survey question on an individual’s living situation and active duty military status removed
    • Active duty military status (PPI)
    • Diagnosis codes specifying cause of death and other conditions that may be subject to public knowledge removed

For additional information on the All of Us Research Program’s privacy methodology or to apply similar privacy protection principles to your data, see the resources listed below.

Please review our Data User Code of Conduct for further information.

What is the Resource Access Board (RAB)?

The All of Us Resource Access Board (RAB) is the board charged with protecting the data that participants share.

The RAB has two roles: reviewing research projects to ensure compliance with the Data User Code of Conduct (DUCC) and helping researchers with questions about program policies. The RAB is composed of members with diverse expertise in clinical research, bioethics, community-engaged research, and data privacy, as well as Participant Ambassadors. The RAB also draws on outside experts when needed.

How the RAB reviews workspaces
When researchers begin a project in the Researcher Workbench, they must create a workspace description, which is publicly available in the Research Project Directory. Each workspace description contains a field where anyone may request a review of a project through the directory. The RAB is responsible for reviewing these workspaces, either upon request or as part of a routine workspace audit.

After a review is initiated, the RAB will examine the workspace to determine whether there are any violations of the DUCC. This includes careful consideration whether projects may potentially be discriminatory or stigmatizing to any individuals, groups, or communities.

If there are no violations, the research may continue. If the RAB finds a violation or has concerns about a potential future violation, then they can take a number of actions, including requesting changes to the research. For serious violations, the RAB may also recommend that the program sanction the researcher, end the project, have the researcher’s account disabled, or take other measures as needed.

How the RAB provides guidance
In addition to conducting project reviews to ensure that researchers are complying with All of Us policies, the RAB is always available to assist researchers with compliance.

Researchers may contact the RAB directly at AOUResourceAccess@od.nih.gov with questions about complying with the Data User Code of Conduct and accompanying policies. This may include questions about crafting a meaningful workspace description, preventing stigmatizing research, complying with the Data and Statistics Dissemination Policy, or other topics. The RAB also reviews requests for exceptions from the DSD Policy, which researchers can submit through the Data and Statistics Dissemination Policy Exception Request Form.

For more information on the RAB, please see this article in Research Roundup.

To confirm that your research products are compliant with relevant program policies, please review the All of Us Publication, Presentation, and Poster checklist.

What happens if I ask the Resource Access Board to review my research purpose?

You will still be able to create a workspace and begin your research. The Resource Access Board (RAB) will review your research and contact you if they have clarifying questions or guidance on how to alter your research purpose so that it does not stigmatize a particular population.

Please review our All of Us Research Program Stigmatizing Research Policy, All of Us Research Program Ethical Conduct of Research Policy, and All of Us Research Program User Appeals Policy for further information.

What happens if someone requests review of my research purpose on the All of Us Research Hub?

If someone requests a review of your research purpose, the request will be routed to the program’s Resource Access Board (RAB). The RAB may contact you for clarifications or adjustment of your research purpose. If they are really concerned about your research, they may ask you to pause your work while they adjudicate the concern.

Please review our All of Us Research Program Stigmatizing Research Policy, All of Us Research Program Ethical Conduct of Research Policy, and All of Us Research Program User Appeals Policy for further information.

How does the use of Artificial Intelligence (AI) or Machine Learning (ML) tools fit within the context of All of Us Data Use Policies and platform use?

Artificial Intelligence (AI) or Machine Learning (ML) tools for use in data analysis are permissible when working with All of Us Research Program data as long as the tools comply with and the users adhere to existing policies, including the Data User Code of Conduct (DUCC) and the Data and Statistics Dissemination Policy.

Program policy prohibits the download and/or removal of participant-level data from the Researcher Workbench, use of AI and ML tools outside of the Researcher Workbench can occur only by using downloaded summary statistics resulting from the initial analyses on the Researcher Workbench. Many AI and ML tools have corresponding R packages which are available to researchers within the Researcher Workbench.

Can I run Artificial Intelligence (AI) or Machine Learning (ML) tools on All of Us participant data?

Yes, but you must do this within the All of Us Researcher Workbench environment if your tools access or process individual-level data. All usage is subject to the Data User Code of Conduct (DUCC), which prohibits the download or copying of data outside of the Researcher Workbench environment except for certain aggregated data.

Training a model on individual-level data is generally acceptable as long as the model training tools run only within the Researcher Workbench environment (e.g. as code or software downloaded and run on a virtual machine, no interaction with an outgoing API via the installed tool). The DUCC does not constrain which analytical methods or tools are allowed, as long as your use of those tools complies with DUCC requirements.

Can I connect to external Artificial Intelligence (AI) or Machine Learning (ML) services from the Researcher Workbench?

Probably not, although certain limited cases are permitted.

Any data transfer out of the Researcher Workbench environment, whether to a user's device or to a third-party service such as an Artificial Intelligence (AI) or Machine Learning (ML) API, must comply with the Data User Code of Conduct (DUCC).

So for example, you can't send individual-level data on a number of individuals to the ChatGPT API and ask it to summarize their medical histories. However, if you have a collection of aggregated data that would be permissible to export and download, those summary statistics can be sent to the ChatGPT API for analysis as long as they comply with the Data and Statistics Dissemination Policy.

Note: ChatGPT is used here as an example; the DUCC treats all external destinations identically, and any transit of individual data to an external system is considered a DUCC violation by the Researcher Workbench user who caused that transfer to happen, even if the data is not stored or even used.

If I train a model in the Researcher Workbench, can I download and/or export that model for use elsewhere?

Only if the data stored within, implied by, or discoverable via that model meets the Data User Code of Conduct (DUCC), the Data and Statistics Dissemination Policy, and other rules.

If the model contains individual-level or participant-level data, it cannot be downloaded or exported, as this would be a violation of the DUCC. Because it is usually difficult to know what types of answers an Artificial Intelligence (AI) or Machine Learning (ML) model can possibly obtain, we recommend that researchers assume the AI or ML model contains all the data it was trained with.

If export and/or download of those data are permitted according to the DUCC, then the model itself would also be permissible for export and/or download. However, if the model is trained on non-exportable data, such as individual-level data, you must assume the model may contain and reveal some of this data.

Unless you can prove that the model itself cannot possibly be used to reconstruct non-exportable data, you cannot export the model. Download and/or export of a model in cases like this is prohibited under the DUCC and would require an exemption from the All of Us Resource Access Board.

I’m still unsure whether the Artificial Intelligence (AI) or Machine Learning (ML) tool I plan to use is compliant with All of Us Data User Code of Conduct (DUCC) policies.

We recommend using a preinstalled tool on the Researcher Workbench when possible. When installing a different compatible Artificial Intelligence (AI) or Machine Learning (ML) tool of your choice, you are responsible for ensuring your use of the tool does not violate policy.

We suggest reviewing the “Read Me” file and other available details of how the tool works to ensure it does not involve an external API or otherwise exports participant-level data out of the Researcher Workbench.

When exporting data outside of the Researcher Workbench, remember that uploading data to an external AI or ML tool constitutes public dissemination and so ensure your data does not include participant-level data of any kind.

If you have questions, you may reach out to the Researcher Workbench for help with question prompts to help guide your review, but you are ultimately responsible for the tools you import or utilize externally and ensuring the data you export for sharing is compliant with the DUCC policies.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.