Selecting participants: using the Cohort Builder tool

  • Updated

Now that you have created a workspace and understand more about the All of Us data, one of the first things you may want to do is build a cohort. The Cohort Builder tool provides a point-and-click interface that allows you to include and/or exclude the types of participants you want to study. You can also review and annotate a subset of your cohort to validate your inclusion/exclusion criteria in Cohort Review.

Here we will cover the basics of building a cohort.


Creating a New Cohort

  • Within a workspace, select the "DATA" tab (you will likely start here by default).
  • Click on the mceclip0.png in the “Cohorts” box to create a new cohort.
  • Select "ADD CRITERIA" to start.


Add Criteria

  • When you select "ADD CRITERIA," a drop-down menu appears.
  • You can search a keyword across all domains, or select an option from either "Program Data" or "Domains." 
    • Additional options under "Demographics" include: "Current Age/Deceased," "Ethnicity," "Gender Identity," "Race," and "Sex Assigned at Birth."



  • When you select criteria you want added to your cohort, it will be added in your shopping cart.


  • To delete criteria, click on the snowman icon (CB_snowman.jpg) to the left of the criteria selected, then click "Delete criteria."




And/Or Criteria

  • For criteria within a group, use the OR operator. Participants must meet one OR the other criteria but not necessarily meet all of the criteria in that group.
  • For criteria in separate groups, use the AND operator, such that both groups of criteria must be met for a participant to be added to the cohort.


Include/Exclude Participants

  • You have the option to include and exclude participants who meet your specified criteria.
  • Use the left panel labeled "Include Participants" to include participants. 
  • Use the middle panel labeled “And Exclude Participants” to exclude participants.


Apply Optional Modifiers

  • Once a criteria is added to your shopping cart, you have the option to “Apply optional Modifiers” to the selected cohort criteria.  


"Results by"

  • The right panel keep a running total ("Total Count") of participants that will be included in your cohort.
  • Here, you can see a breakdown of your cohort by gender identity or sex at birth and current age, age at consent, or age at CDR. Click "REFRESH" once your selection is made to update the displays below.
  • Click on the three uneven bars (image-1.png) to the right of “Results by Gender Identity, Current Age, and Race” to see results by percentage rather than count. 


We've covered the basics of adding and excluding criteria, now lets explore how to find the criteria you are looking for...


  • The "Age" category will default to "Current Age," however you can select "Age at Consent" or "Age at CDR" as well. You can also move the slider bars to change the age range of interest.
  • By selecting "Deceased," participants that are coded as deceased within the dataset will be added to your cohort if they fit your cohort criteria.
  • You can add other demographics such as "Ethnicity," "Gender Identity," "Race," and/or "Sex Assigned at Birth" to your criteria.
  • If you do not find a demographic category that you were expecting to see, please visit the All of Us participant privacy article for more details.
  • Click "Finish & Review" to save your selections.


Survey Data

  • All surveys with available data will be listed.
  • Select an entire survey by clicking the plus sign (cb_plus_icon.jpg) next to the survey of interest to add all of the participants that took the survey to your cohort.
  • Select the > to see all questions and answers within a survey.
    • Select a question by clicking the plus sign (cb_plus_icon.jpg) next to the question of interest to add all of the participants that answered that question to your cohort.
    • Select an answer by clicking the plus sign (cb_plus_icon.jpg) next to the answer of interest to add all of the participants that selected that particular answer to your cohort.
    • The COVID-19 Participant Experience (COPE) Survey is a versioned survey, so you can select which version(s) you want to include in your dataset for this survey. To include all of the questions in all of the survey versions, click the plus sign (cb_plus_icon.jpg) next to the survey name. To select specific questions or questions in only specific versions, click on the > next to the survey name, then click the image-2.png icon next to the question(s) of interest. You will be able to select which version you would like to include data from or all survey versions ("Any version").


Physical Measurements Data

Physical measurements are collected at the time of participant enrollment. These do not come from the participant’s EHR. When you see the symbol , you can select a specific value or range of values for the measurement of interest.

•  When you select "Blood Pressure"  you can choose from a pre-selected range of blood pressures or enter your own range by clicking on the  symbol.

•  Options for "Heart Rate", "Height," "Weight," "BMI," "Waist Circumference," and "Hip Circumference" can all be specified by clicking on the  symbol.


Conditions Domain

The "Conditions" domain is an example of when you need to think about source versus standard codes. In the conditions search bar, you can search by source code (e.g., ICD9), standard code (e.g., SNOMED), or by entering the name of a condition. Recall that when you add criteria by a standard code, it will encompass the source codes that were mapped to it according to the Observational Medical Outcomes Partnership (OMOP) common data model (CMD).

Let’s explore this in more detail by using the same example found in the Exploring concepts and concept relationships within Observational Medical Outcomes Partnership (OMOP) article.

Start by searching the ICD10CM concept code, F33. You’ll see a note under the search bar displaying the number of participants with this source code and to browse the Standard Vocabulary for more results.




If you click on Standard Vocabulary, you’ll see the SNOMED code and a larger number of participants with this condition found in their EHR.



This is due to the OMOP CDM and the way multiple source vocabularies are mapped to standard vocabularies. Ultimately, it is up to you, the researcher, whether you prefer to use the source or standard vocabulary to build your cohort.



Another thing to keep in mind is the relationship between concepts. When you select and add SNOMED 66344007 to the criteria, you are also adding any concepts it subsumes. Click on “View Hierarchy” symbol to see all the subsumed concepts. You can choose to include all the subsumed concepts by adding the  66344007 "Recurrent major depression" condition or you can get more granular and include one or more of the subsumed concepts instead.




Procedures Domain

Procedures can also be searched by source or standard code. Similar to the conditions domain, procedures are also mapped from source to standard vocabulary and exist within a hierarchy that can be viewed in the Cohort Builder tool.


Drug Domain

Drug data from the Drug domain and EHRs are mapped from source to standard codes and are organized according to the Anatomical Therapeutic Chemical (ATC) classification scheme. Let’s start with an example by searching for a drug by its brand name such as Zoloft. You will see a note that "Only ingredients may be added to your search criteria." The Cohort Builder tool allows you to search by brand name and see the ingredient, in this case Sertraline, and add it. As in the case for conditions and procedures, you can also view where the ingredient falls within the ATC hierarchy.




Labs and Measurements Domain

The standard vocabulary for measurements is Logical Observation Identifiers Names and Codes (LOINC). The Cohort Builder tool allows you to define a value or range of values for a measurement. For example, if you search Vitamin B12 you will see the following results returned (see image below). By clicking on the  symbol, you can define a value or range of values for the measurement you select. And as with the other EHR domains, you may view the measurement within the LOINC hierarchy.




Genomics Data

If you have access to the Controlled Tier dataset, you can use the Cohort Builder to include participants in your cohort with genomic data of interest. Using the "variant search" feature, you can select all participants with structural variant data or select participants with specific SNP/Indel Variants. To learn more about the variant search tool, please see this support article: Variant Search within the Cohort Builder



You can also learn how to use the Cohort Builder tool using the following video:



Was this article helpful?

6 out of 8 found this helpful

Have more questions? Submit a request



Article is closed for comments.