Creating a dataset and selecting types of data to analyze: using the Dataset Builder tool and Concept Set Selector tool
Now that you have a cohort, you can build your dataset using the three-step Dataset Builder tool. A dataset takes the list of participants that fall within your cohort criteria and allows you to look at specific aspects of their health or survey data. As discussed previously, All of Us data are structured in tables. Each table consists of rows and columns. Table rows represent participants and columns provide information about the participants. Below is an example of what this looks like for the PERSON table (not real participant data).
Other tables contain many more columns. For example, the DRUG table has 27 columns as it needs to provide information on the drug and a comprehensive account of drug usage for each participant, including but not limited to the following: drug name, route of administration, dosage, days’ supply, number of refills, start date, and end date. When building a dataset, you can choose only the columns you want to see.
To build a dataset you will select your cohort(s), concept set(s) and values (columns) of interest. For example, continuing with the depression cohort used as an example in the article “Selecting participants: using the Cohort Builder tool,” we can look at specific measurements derived from a complete blood count, demographics, and how members of the cohort answered a specific survey question. In order to do this, we’ll need to select our depression cohort from (1) Select Cohorts and build concept sets.
Concept Set Selector
A concept set is a group of concepts from a single data domain that you want to examine in your cohort. Let’s say that you want to know more about the results of tests typically performed by a physician. You also want to know if the participant is a self-reported smoker and find out if they are taking any specific medications. Below is an example of three concept sets from three different domains you could build.
To create a concept set, click on the plus sign () next to Select Concept Sets. Here, you can search the different domains to add concepts of interest. Selecting concepts will add them to your Selected Concepts.
When you have added all of the concepts of interest, click Finish & Review. Once you have finished adding concepts and have reviewed your concept set, select Save Concept Set. Here you will be prompted to either add the concept set you just created to an existing set or to create a new concept set.
If you add your concept set to an existing set, only existing concept sets in that domain will appear in your dropdown menu.
To add concept sets to your dataset, select the cohort(s) you would like to use when building your dataset, and then under (2) Select Concept Sets select the concept set you would like to add.
You are able to preview the resulting table, save it, and export it to a notebook so you can begin your analysis. If you have questions on the types of data available or how it is organized within the dataset, see our articles How the All of Us data are organized and Observational Medical Outcomes Partnership (OMOP) common data model (CDM).
After building your dataset, you can export it to a notebook so that you can start working with and analyzing the data. The Dataset Builder generates a query of the All of Us data tables based on the cohort criteria you set. The concept sets and columns you selected then produce a pandas dataframe. For an in-depth tutorial of using Jupyter Notebooks, see this article.
You can also learn more about the Concept Set Selector and Dataset Builder using the following video: