Building a Dataset with the Dataset Builder

  • Updated

The Dataset Builder is a point-and-click tool available to you in the All of Us Researcher Workbench for creating your dataset and selecting the data you want to analyze.

There are three steps for building your dataset: selecting your cohort, selecting your concept set(s), and selecting your values. Before you select your cohort or your values, you will want to select your concept set(s).

A concept set is a group of concepts from a single data domain that you want to examine in your cohort. Read “Exploring Concepts with OMOP and SQL” for information about concepts and concept relationships.

To create your concept set(s)

  1. Click “white plus sign on top of a blue circle” to the right of “Datasets.”

    After clicking the white plus sign on top of the blue circle, the Dataset Builder page will display. The Dataset Builder includes four sections: Select Cohorts (Participants) on the left, Select Concept Sets (Rows) in the middle, Select Values (Columns) to the right, and Preview Dataset on the bottom.

  2. Click “white plus sign on top of a blue circle” to the right of “Select Concept Sets (Rows).”
    Note: Some prepackaged concept sets are available to you under the Select Concept Sets section.

    After clicking the white plus sign on top of the blue circle, the Search Concepts page will display. The Search Concepts page includes the domains (conditions, drug exposures, labs and measurements, etc.) and the data available in the Researcher Workbench.

  3. Type in the search bar or explore the domains, survey questions, and more to find your concept(s) of interest.

    After you click or search for a specific concept, you will see a list of concepts meeting your search criteria.

  4. Click on the concept(s) of interest to add them to your concept set.

    To add the concept, there is a green plus sign inside a green circle to the left of each concept. After adding all your concepts, there is a blue finish and review button in the bottom right.

  5. Click “Finish & Review.”

    After clicking Finish and Review, a panel will appear on the right title “Selected Concepts.” All the concepts you selected will appear in the panel. You can remove any concepts by clicking the red x within a red circle to the left of any of the concepts listed. In the bottom right of the panel, there will be a blue Save Concept Set button.

  6. Click “Save Concept Set.”

    After clicking save concept set, a pop-up will appear asking you if you want to add the concept(s) to an existing concept set or create a new concept set via a radio button option.

  7. Select whether you want to add the concept(s) to an existing concept set or create a new concept set.

    Under the radio button for choose existing set or create new set, there is a text field box for the name of your concept set and a text field for your concept set’s description.

  8. Name your concept set and add a description for your concept set.

    After you’ve named and provided a description of your concept set, the grayed out Save button in the bottom right will turn blue.

  9. Click “Save.”

    After you click save, the Concept Set Saved Successfully screen will appear.

  10. Now, you can choose to create another concept set or create a dataset.

    On the Concept Set Saved Successfully screen, you have two options for what to do next. You can choose to create another concept set. To create another concept set, click the create another concept set button at the bottom of the left-hand card. You can also choose to create a dataset. To create a dataset, you would click the create a dataset button at the bottom of the center card.

To build your dataset

  1. Select your cohort under the “Select Cohorts (Participants)” column on the left by clicking the checkbox.
    Note: Under “Select Cohorts (Participants),” you can see a prepackaged cohort of “all participants.” This will pull ALL the participant data in the Researcher Workbench. We do not recommend pulling the entire database into your workspace.



  2. Select your concept set(s) under “Workspace Concept Sets” in the “Select Concept Sets (Rows)” column by clicking the checkbox.



  3. Confirm your values under the “Select Values (Columns)” column on the right by clicking the checkbox.
    • All values are selected by default. You can deselect any values that you do not want to bring into your dataset.



  4. Click “View Preview Table” to display a preview of the resulting table based on your Dataset Builder selections.

    The View preview table link is in the top right of the Preview Dataset card at the bottom of the Dataset Builder screen.

    After clicking the View Preview Table link, a table will appear as a preview of the dataset you just created. You can scroll through the table to ensure all the fields that you want in your dataset are present. In the bottom right of the screen, there are two buttons: Create dataset and Analyze. Create dataset is blue and Analyze is grayed out.

  5. Click “Create Dataset.”

    After clicking the blue create dataset button, a pop-up will appear with two text field boxes for naming and providing a description for your dataset.

  6. Name your dataset and add a description for your dataset.

    In the bottom right of the pop-up, the Save button is grayed out until you name and provide a description for your dataset. Once you’ve provided a name and a description, the Save button will turn blue.

  7. Click “Save.”

    After clicking the blue save button, the Dataset Builder will reappear on your screen. In the bottom right of the screen, there are still two buttons, but instead of Create dataset and Analyze, you’ll see Save dataset and Analyze. Save dataset is grayed out unless you make changes to your dataset. The Analyze button is no longer grayed out and is blue.

  8. Click “Analyze.”

    After clicking the blue analyze button, a pop-up will appear titled Export Dataset. This pop-up is where you will select your programming language via radio buttons for R, Python, or SAS and preview your code.

  9. Select your programming language (R, Python, and SAS) and complete the fields as necessary based on your analysis tool of choice (Jupyter Notebook, RStudio, and SAS Studio).



    If you select R or Python, you can select to export Python or R code to an existing or new Jupyter Notebook. If you want to use SAS Studio or RStudio, you must copy the code from the “Export Dataset” screen, launch SAS Studio or RStudio, and run your analysis.

    For detailed steps on how to export and analyze your dataset by analysis tool, read “Exporting and analyzing your dataset.”

    Note: If you are familiar with SQL programming language, you can choose to skip using the Cohort Builder and the Dataset Builder and go straight to Jupyter Notebook to query the All of Us data if you prefer. We do not recommend this for researchers with limited SQL knowledge.

Looking for more information on the Dataset Builder? Check out our YouTube Channel for a video about using the Dataset Builder.

After building your dataset, you can begin your analysis in the Researcher Workbench.

Next article

Analyzing Data in the All of Us Researcher Workbench

Explore the resources and analysis tools available for analyzing data

Was this article helpful?

2 out of 3 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.