Policy Questions

  • Updated

How do I cite the Researcher Workbench in my grants or publications?

We ask that all researchers using the Researcher Workbench honor the contribution of those who take part in All of Us to their research project’s work.

This includes in all oral and written presentations, disclosures, and publications resulting from any analyses of the data. The following are examples of acknowledgement and data availability statements.

Example acknowledgement statement
“We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data [and/or samples and/or cohort] examined in this study.”

Example data availability/data access statement
“This study used data from the All of Us Research Program’s [Registered/Controlled] Tier Dataset [version number], available to authorized users on the Researcher Workbench.”

Please review our All of Us Research Program Data and Statistics Dissemination Policy and All of Us Research Program Publication and Presentation Policy for further information.

How do I notify All of Us about my upcoming publication?

Researchers are required to notify the program of any publication or presentation using All of Us Research Program data at least 2 weeks before the date of publication or conference presentation. View the checklist about the reporting process.

You can notify All of Us about your upcoming publication via the "Contact Us" feature under the hamburger menu (Three Blue Line Icon.png) in the upper left hand corner of the Researcher Workbench homepage. You can also notify us directly through the Publication and Presentation Reporting Form.

The information you provide will be used by the All of Us Research Program for notification and communications planning purposes, without requirement for program review or approval. This information or any manuscript you submit will not be shared or disseminated outside the program until after it is published. You should submit an electronic version of the final, peer-reviewed manuscript to PubMed Central immediately upon acceptance for publication. To see detailed instructions on how to submit a manuscript without an embargo period, please visit the NIHMS Tutorials page, and click on Deposit Files, which contains an in-depth presentation with full screenshots of the process.

Researchers using the Researcher Workbench must also remember to honor the contribution of those who take part in All of Us to their research project’s work. This includes all oral and written presentations, disclosures, and publications resulting from any analyses of the data.

Please review our All of Us Research Program Data and Statistics Dissemination Policy, All of Us Research Program Publication and Presentation Policy, and Data User Code of Conduct for further information.

Does All of Us have intellectual property rights over products developed from data in the Researcher Workbench?

All of Us claims no intellectual property rights on products developed from research using All of Us data. All of Us supports and recommends that research products and services emerging from secondary research using All of Us data be accessible broadly.

What data or figures can I download in compliance with the Data User Code of Conduct?

As outlined in the Data User Code of Conduct, you cannot make copies of or download any participant-level data from the All of Us Researcher Workbench. Aggregate statistics that are more granular than buckets of 20 individuals may not be distributed or published without approval from the All of Us Research Program.

We highly recommend that any downloaded data table, chart, or figure should have summary counts of at least 20 so you don't later violate our Data and Statistics Dissemination Policy. For example, a count of 5 or 9 should be rounded up to 20; however, a count of 35 can stay as 35. This helps us protect participants from the risk of re-identification.

Please review our All of Us Research Program Data and Statistics Dissemination Policy, All of Us Research Program Publication and Presentation Policy, Data User Code of Conduct, and Egress Alert Policy for further information.

What do I need to do before importing external data into the Researcher Workbench to make sure I don’t violate any of the All of Us policies?

You can upload or import external data, codes, or files into your workspace, but it is important to remember that you are responsible for ensuring that you have the appropriate rights to anything you upload and that you have removed all of the personally identifiable information (PII) from any data or files before you upload them into your workspace. PII includes, but are not limited to: names, dates, addresses or geographic information smaller than the first three digits of the zip code, unique id numbers or codes such as Social Security Numbers, Medical Record Numbers, phone and fax numbers, biometric, photographs or comparable images, etc. When you share external data, codes, or files into your workspace, they will be available to you and other researchers collaborating on your workspace, but not generally available to other All of Us researchers.

Please note: By agreeing to the Data User Code of Conduct, you take full responsibility for any external data, files, or software that they import into the All of Us Researcher Workbench. It is your responsibility to only upload data you are authorized to use, in accordance with any data use restrictions in place, and to ensure that the collaborators of your workspace also follow these restrictions. You may import data into the Researcher Workbench as long as they comply with All of Us policies.

For additional guidance protecting participant privacy and on complying with the All of Us policies, read the “How do I comply with All of Us policies when importing data into the Researcher Workbench?” FAQ.

Please review our Data User Code of Conduct and Egress Alert Policy for further information.

How do I comply with All of Us policies when importing data into the Researcher Workbench?

The Researcher Workbench allows you to import your own data or codebase into your individual workspace for analysis, however, you will need to take certain precautions before importing the data to ensure appropriate use and to protect data privacy.

First you should make sure that you have the appropriate clearance/access to use the data and/or share it with your collaborators who have access to your workspace, as outlined in the Data User Code of Conduct (DUCC). The DUCC also states that you will need to remove any personally identifiable information (PII), protected health information (PHI), or identifiable private information (IPI) from your data BEFORE importing any files into your workspace.

Personal Identifying Information (PII) refers to information that can be used to distinguish or trace the identity of an individual (e.g., name, social security number, biometric records, etc.) either alone, or when combined with other personal or identifying information that is linked or linkable to a specific individual. Protected Health Information (PHI) refers to individually identifiable health information that is transmitted by electronic media, maintained in electronic media, or transmitted or maintained in any other form or medium. Identifiable Private Information (IPI) refers to private information where the identity of an individual is or may readily be ascertained by the investigator or associated with the information. PII generally includes PHI and IPI.

Removal of PII from data imported into your workspace
PII broadly includes any information that can be used to trace the identity of an individual. Data elements may be considered PII due to various factors, such as information that is publicly known about individuals in the database.

The Health Insurance Portability and Accountability Act (HIPAA)’s Privacy Rule provides a broader guidance for “de-identifying” datasets for dissemination. The Privacy Rule recommends removing 18 specific data elements that could be used to identify an individual or their relatives within the dataset. These data elements include, but are not limited to: names, dates, addresses or geographic information smaller than the first three digits of the zip code, unique id numbers or codes such as social security numbers, medical record numbers, phone and fax numbers, biometric, photographs or comparable images, etc. Datasets with these data elements removed are considered ‘de-identified’ by HIPAA, provided the dataset is not known to have any additional information that could identify individuals within the dataset.

To maximize protection of participant privacy, the All of Us Research Program has incorporated our own privacy methodology into our data curation processes. In the Registered Tier data, we remove all explicit identifiers and apply additional measures, such as suppressing or generalizing additional variables considered quasi-identifiers based on re-identification risk. The privacy methodology applied for All of Us Registered Tier data is summarized below:

  • All explicit identifiers that could be used to identify individuals within the dataset or their relatives are removed. These include:
    • Names
    • All unique IDs used for any purpose outside of the Researcher Workbench (e.g., participant ID, social security number, medical record number, phone and fax numbers, etc.)
    • IP addresses and URLs that could be linked to individuals
    • All dates are shifted back by a random number between 1 and 365
    • All free-text fields in surveys and full-text clinical notes removed
    • All geo-location data smaller than US state except EHR site removed
    • Demographic details 
    • Survey question on an individual’s living situation and active duty military status removed
    • Active duty military status
    • Diagnosis codes specifying cause of death and other conditions that may be subject to public knowledge removed

For additional information on the All of Us Research Program’s privacy methodology or to apply similar privacy protection principles to your data, see the resources listed below.

Please review our Data User Code of Conduct for further information.

What is the Resource Access Board (RAB)?

The All of Us Resource Access Board (RAB) is the board charged with protecting the data that participants share.

The RAB has two roles: reviewing research projects to ensure compliance with the Data User Code of Conduct (DUCC) and helping researchers with questions about program policies. The RAB is composed of members with rich expertise in clinical research, bioethics, community-engaged research, and data privacy, as well as Participant Ambassadors. The RAB also draws on outside experts when needed.

How the RAB reviews workspaces
When researchers begin a project in the Researcher Workbench, they must create a workspace description, which is publicly available in the Research Project Directory. Each workspace description contains a field where anyone may request a review of a project through the directory. The RAB is responsible for reviewing these workspaces, either upon request or as part of a routine workspace audit.

After a review is initiated, the RAB will examine the workspace to determine whether there are any violations of the DUCC. This includes careful consideration whether projects may potentially be discriminatory or stigmatizing to any individuals, groups, or communities.

If there are no violations, the research may continue. If the RAB finds a violation or has concerns about a potential future violation, then they can take a number of actions, including requesting changes to the research. For serious violations, the RAB may also recommend that the program sanction the researcher, end the project, have the researcher’s account disabled, or take other measures as needed.

How the RAB provides guidance
In addition to conducting project reviews to ensure that researchers are complying with All of Us policies, the RAB is always available to assist researchers with compliance.

Researchers may contact the RAB directly at AOUResourceAccess@od.nih.gov with questions about complying with the Data User Code of Conduct and accompanying policies. This may include questions about crafting a meaningful workspace description, preventing stigmatizing research, complying with the Data and Statistics Dissemination Policy, or other topics. The RAB also reviews requests for exceptions from the DSD Policy, which researchers can submit through the Data and Statistics Dissemination Policy Exception Request Form.

For more information on the RAB, please see this article in Research Roundup.

To confirm that your research products are compliant with relevant program policies, please review the All of Us Publication, Presentation, and Poster checklist.

What happens if I ask the Resource Access Board to review my research purpose?

You will still be able to create a workspace and begin your research. The Resource Access Board (RAB) will review your research and contact you if they have clarifying questions or guidance on how to alter your research purpose so that it does not stigmatize a particular population.

Please review our All of Us Research Program Stigmatizing Research Policy, All of Us Research Program Ethical Conduct of Research Policy, and All of Us Research Program User Appeals Policy for further information.

What happens if someone requests review of my research purpose on the All of Us Research Hub?

If someone requests a review of your research purpose, the request will be routed to the program’s Resource Access Board (RAB). The RAB may contact you for clarifications or adjustment of your research purpose. If they are really concerned about your research, they may ask you to pause your work while they adjudicate the concern.

Please review our All of Us Research Program Stigmatizing Research Policy, All of Us Research Program Ethical Conduct of Research Policy, and All of Us Research Program User Appeals Policy for further information.

Can I run Artificial Intelligence (AI) or Machine Learning (ML) tools on All of Us participant data?

Yes, you may use Artificial Intelligence (AI) or Machine Learning (ML) tools when working with All of Us Research Program data as long as the tools comply with and the users adhere to All of Us policies, including the Data User Code of Conduct (DUCC) and the Data and Statistics Dissemination (DSD) Policy. In general, All of Us does not constrain which analytical methods or tools are allowed, as long as use of those tools complies with all policy requirements.

Under the terms of the DUCC, authorized users are prohibited from downloading and/or removing participant-level data from the Researcher Workbench (RW); therefore, any use of AI and ML tools on participant-level data must take place within the All of Us environment. Authorized users may download summary statistics resulting from their analyses for use with AI and ML tools. Any upload to public tools, including Large Language Models (LLMs), constitutes dissemination and must comply with the rules outlined in the DSD Policy. 

Many AI and ML tools have corresponding R packages that are available to researchers within the RW. When relying on these or other tools, authorized users must ensure that they run only within the RW environment (e.g., as code or software downloaded and run on a virtual machine, with no interaction with an outgoing API via the installed tool). 

Can I connect to external Artificial Intelligence (AI) or Machine Learning (ML) services from the Researcher Workbench?

Only under limited circumstances.

Any data transfer out of the Researcher Workbench environment, whether to a user's device or to a third-party service such as an Artificial Intelligence (AI) or Machine Learning (ML) API, must comply with the Data User Code of Conduct (DUCC) and all other All of Us policies.

Take this example. You may not send individual-level data on a number of participants to the ChatGPT API and ask it to summarize their medical histories. However, if you have a collection of aggregated data that would be permissible to export and download, those summary statistics can be sent to the ChatGPT API for analysis as long as they comply with the Data and Statistics Dissemination Policy. That is, any upload to a public instance of ChatGPT may not reveal participant counts of less than 20.

Note: ChatGPT is used here as an example; the DUCC treats all external destinations identically, and transit of individual-level data to any external system is considered a DUCC violation, even if the data is not stored or used.

If I train a model in the Researcher Workbench, can I download and/or export that model for use elsewhere?

At this time, you cannot export models trained on participant-level data. NIH prohibits the download and dissemination of generative AI models trained on genomic data and its derivatives (see NOT-OD-25-81). The program is still in the process of working with agency and departmental leadership to determine appropriate boundaries around the download and dissemination of AI/ML models trained on other types of sensitive participant data. For now, we encourage you to explore using the Community Workspaces option within the Researcher Workbench to share models trained on participant-level data.

If you train your model on summary data that complies with the Data and Statistics Dissemination Policy, (i.e., data that does not reveal participant counts of less than 20), you may export your model.

Please email support@researchallofus.org if you have additional questions about what is allowed.

I’m still unsure whether my planned use of Artificial Intelligence (AI) or Machine Learning (ML) tools is compliant with the All of Us Data User Code of Conduct (DUCC) and other policies. How should I proceed?

We recommend using a preinstalled tool on the Researcher Workbench when possible. When installing a different compatible Artificial Intelligence (AI) or Machine Learning (ML) tool, you are responsible for ensuring your use of the tool does not violate the Data User Code of Conduct or other All of Us policies.

We suggest reviewing the “Read Me” file and other available details of how the tool works to ensure it does not involve an external API or otherwise exports participant-level data out of the Researcher Workbench. Remember that uploading data to a public AI or ML tool constitutes public dissemination, and any upload of summary statistics must be compliant with the Data and Statistics Dissemination Policy.

If you train a model on All of Us participant-level data, you may not download or disseminate it at this time.

If you have questions, you may reach out to the Researcher Workbench support team for help, but you are ultimately responsible for the tools you import and any data you download from the RW.

What are data collection policies shown on Researcher Workbench 2.0, and how are these different from All of Us data access and use policies?

Data collections are curated datasets published in Verily Pre, the platform that powers Researcher Workbench 2.0. There is currently one All of Us data collection available in this new Workbench: “All of Us Registered Tier". The “All of Us Controlled Tier” will be provided in a subsequent release. These two data collections are synonymous with the curated datasets available in the legacy Researcher Workbench. When you log in to the Researcher Workbench using your @researchallofus.org username, you will automatically be provided access to any All of Us data collections for which you have completed the associated access requirements. The same data access requirements that you are familiar with from the legacy Researcher Workbench (e.g., ID verification, Responsible Conduct of Research Training, Data User Code of Conduct attestation, etc.) are in place for gaining access to these data collections.

All data collections available through Verily Pre, including the All of Us Researcher Workbench 2.0 data collections, come with data collection policies- that explicitly delineate built-in technical parameters to enforce data access and use restrictions. The same parameters were in place for All of Us data on the legacy Researcher Workbench, too, but they were not named or presented in the same way. These technical parameters, for which Verily Pre broadly uses the term ‘policy,’ are distinct from the All of Us data access and use policies that you are already familiar with, which outline the program’s rules for access to and use of All of Us data on the Researcher Workbench. These All of Us policies have not changed, and researchers are still responsible for reviewing and complying with them independent from the data collection policies. The full list of data use and researcher policies can be viewed on this page.

How do I comply with All of Us policies when using GitRepo on the Researcher Workbench 2.0?

GitRepo (Git repository) is intended for tracking and managing changes of code, scripts, and documentation, enabling version control. GitRepo is hosted outside of the Researcher Workbench and could be accessible outside of the Researcher Workbench by anyone with permissions to that repository. Please note that it would be a violation of the All of Us Data User Code of Conduct and Data and Statistics Dissemination Policy to share any participant-level data or direct participant counts of fewer than 20 outside of the Researcher Workbench. Therefore, to remain in compliance with All of Us policies, researchers should ensure that no participant-level data or counts of <20 (direct or inferable) are included in the code, scripts, or documentation included in the GitRepo when using GitRepo in their workspace. This includes in the outputs of the code or embedded into the code as comments or filters. Please be sure to go over your code and clear the outputs before including in GitRepo.

Was this article helpful?

2 out of 4 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.