Exploring Concepts with OMOP and SQL

  • Updated

Note: We recommend reading “Understanding OMOP Basics” before reading the following article as a basic understanding of the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is helpful to understanding concepts and concept relationships.

For a hands-on workshop about OMOP and electronic health record (EHR) data, watch the “Intro to EHR Data Workshop.”

All data collected from participants and health care organizations via surveys, physical measurements, and electronic health records (EHRs) are expressed as “concepts” in OMOP.

Understanding how concepts are defined, organized, and the potential complex relationships that can exist between them in the All of Us dataset will help you decide which concepts to select when you’re building your research cohort.

A benefit of a common data model like OMOP is that the source vocabulary is maintained in the database. When a concept is mapped in OMOP, the source vocabulary is coded as a concept_code.

The concept_code represents the identifier of the concept in the source vocabulary, such as SNOMED-CT, ICD-9, RxNorm, etc. Each concept is also assigned a concept_id which is a unique string of numbers given to a concept.

Because source vocabulary is maintained, you can query data with the source vocabulary or the standard vocabulary. OMOP uses standard vocabulary to map concepts to one another. For a list of the standard vocabulary used by OMOP and an overview of the mapping process, read “Understanding OMOP Basics.”

There are two ways to see if a concept_id is standard or not.

  1. Using Athena, a publicly available website maintained by the OMOP community, to browse and explore all the concepts in OMOP.

    To identify if a concept is standard or not, you will search by typing into the search bar or by exploring topics by OMOP domain. For example, let’s search for “Type 2 diabetes.”

    Screenshot of the search results for Type 2 diabetes with Athena. Detailed description of the search result elements are included in the article.

    The search results display in a list format with multiple columns. The first and second columns are the concept_id and the concept_code followed by the concept_name. The table also includes if the concept is standard or not, which OMOP domain the concept belongs to, and what the standard vocabulary is for the concept.

    Using our example, you can see in the Type 2 diabetes search results that the first listing is marked as the standard concept while the second listing is marked as non-standard.

  2. Querying the concept table and concept relationship table with SQL in the All of Us Researcher Workbench. If the column standard_concept_code is “S” then that concept is standard. If the column is “None,” then that concept is not standard.

Exploring concepts with SQL

Let’s work through an example of pulling data for specific concepts using the programming language SQL.

For this example, we are going to use SQL to build and perform our queries of the OMOP tables. If you do not have SQL experience, you can build your queries using the point-and-click tools, Cohort Builder and Dataset Builder.

The standard form of a SQL query includes

SELECT column_name_1, column_name_2, … (You can use * to select all columns from a table)
  FROM table_name
  WHERE condition_to_be_met

Using the standard form of a SQL query, let’s build a query using the example depression with the concept_name, standard concept_id, standard concept_code, or any concept_code.

  1. With a concept_name

    Note: Concept_name is not a unique identifier. For example, several concepts are available within the All of Us dataset that describe “depression.” For this example, we will focus on a specific type of depression: major depressive disorder.

    SQL Query:
    SELECT *
      FROM concept
       WHERE concept_name = "Major depressive disorder"
    Results:
    concept_id

    concept_

    name

    domain_

    id

    vocabulary_

    id

    concept_

    class_id

    standard_

    concept*

    concept_

    code

    valid_

    start_

    date

    valid_

    end_

    date

    invalid_

    reason

    1568218 Major depressive disorder, recurrent Condition ICD10CM 3-char nonbill code None F33 2012-01-01 2099-12-31 None
    4152280 Major depressive disorder Condition SNOMED Clinical Finding S 370143000 1970-01-01 2099-12-31 None
    44825293 Major depressive disorder, recurrent episode Condition ICD9CM 4-dig nonbill code None 296.3 1970-01-01 2099-12-31 None
    *You can see in this example that we have standard and non-standard concepts from our query in the standard_concept column.
  2. With the standard concept_id

    Note: When working with All of Us data, it’s important to work with the standard concept. You can locate the standard concept_id using Athena.

    SQL Query:
    SELECT *
      FROM concept
       WHERE concept_id = 4152280
    Results:
    concept_id

    concept_

    name

    domain_

    id

    vocabulary_

    id

    concept_

    class_id

    standard_

    concept*

    concept_

    code

    valid_

    start_

    date

    valid_

    end_

    date

    invalid_

    reason

    4152280

    Major depressive disorder

    Condition

    SNOMED

    Clinical Finding

    S

    370143000

    1970-01-01

    2099-12-31

    None

    *You can see in this example that we used the standard concept_id as denoted by the S in the standard_concept column.
  3. With the standard concept_code

    Note: When working with All of Us data, it’s important to work with the standard concept. You can locate the standard concept_code using Athena.

    SQL Query:
    SELECT *
      FROM concept
       WHERE concept_code = ‘37014300’
    Results:
    concept_id

    concept_

    name

    domain_

    id

    vocabulary_

    id

    concept_

    class_id

    standard_

    concept*

    concept_

    code

    valid_

    start_

    date

    valid_

    end_

    date

    invalid_

    reason

    4152280

    Major depressive disorder

    Condition

    SNOMED

    Clinical Finding

    S

    370143000

    1970-01-01

    2099-12-31

    None

    *You can see in this example that we used the standard concept_code as denoted by the S in the standard_concept column.
  4. With the a concept_code

    What if you don’t know the concept_name, standard concept_id, or standard concept_code, but you do know the code in another vocabulary (i.e., ICD-9, ICD-10, CPT, etc.)?

    Because OMOP retains the source vocabulary, you can search with the code you know. Let’s say you know that major depressive disorder is code 296.30 in ICD-9.

    SQL Query:
    SELECT *
      FROM concept
       WHERE concept_code = ‘296.3’
    Results:

    concept_id

    concept_

    name

    domain_

    id

    vocabulary_

    id

    concept_

    class_id

    standard_

    concept*

    concept_

    code

    valid_

    start_

    date

    valid_

    end_

    date

    invalid_

    reason

    44825293

    Major depressive disorder, recurrent episode

    Condition

    ICD9CM

    4-dig nonbill code

    None

    296.30

    1970-01-01

    2099-12-31

    None

    *You can see in this example that we used the non-standard concept_code as denoted by the None in the standard_concept column.

    Let’s say you want to find the standard concept for the non-standard concept. Using the concept_id from the query above, you can find the equivalent standard concept_id with another OMOP table: concept_relationship.

    SQL Query:
    SELECT *
      FROM concept_relationship
       WHERE concept_id_1 = 44825293
    Results:

    concept_id_1*

    concept_id_2**

    relationship_id

    valid_start_date

    valid_end_date

    invalid_reason

    44825293

    44829923

    Subsumes

    2014-10-01

    2099-12-31

    None

    44825293

    44826498

    Is a

    2014-10-01

    2099-12-31

    None

    44825293

    44831089

    Subsumes

    2014-10-01

    2099-12-31

    None

    44825293

    44831090

    Subsumes

    2014-10-01

    2099-12-31

    None

    44825293

    44834589

    Subsumes

    2014-10-01

    2099-12-31

    None

    44825293

    44827655

    Subsumes

    2014-10-01

    2099-12-31

    None

    44825293

    4282316

    Maps to

    2018-02-07

    2099-12-31

    None

    44825293

    44835785

    Subsumes

    2014-10-01

    2099-12-31

    None

    44825293

    44827656

    Subsumes

    2014-10-01

    2099-12-31

    None

    *concept_id_1 is the concept’s source code.
    **concept_id_2 is the concept’s standard code.

    The standard concept_id is the string of numbers in the concept_id_2 column of the concept with “Maps to” in the relationship_id column. In the example, 4282316 is the standard concept_id for major depressive disorder.

Exploring concept relationships with SQL

Concepts stored in a common data model like OMOP can have complex relationships. Using the same example as above related to major depressive disorder, we know there are several different types of recurrent major depression, including mild recurrent major depression.

You can explore concepts and their relationships to one another using the concept_relationship table, which we briefly touched on earlier.

Let’s query the concept_relationship table to find concepts that relate to major recurrent depression using the standard concept_id (4282316).

SQL Query:

SELECT *
  FROM concept_relationship
   WHERE concept_id_1 = 4282316

Results:

concept_id_1

concept_id_2

relationship_id

4282316

4098302

Is a

4282316

433991

Subsumes

4282316

44825293

Mapped from

4282316

4228802

Subsumes

4282316

4094358

Subsumes

4282316

43531624

Subsumes

4282316

44831089

Mapped from

4282316

35207161

Mapped from

4282316

4282316

Mapped from

4282316

4324959

Subsumes

4282316

1568218

Mapped from

In the table, you can see all the concepts (concept_id_2) that were mapped to the standard concept for major recurrent depression (concept_id_1 = “4282316”) in the All of Us dataset.

To see additional information about the concepts (concept_id_2), you can join the concept table with the concept_relationship table.

SQL Query:

SELECT cr.relationship_id, c.*
   FROM concept_relationship cr
   JOIN concept c
   ON cr.concept_id_2 = c.concept_id
   WHERE cr.concept_id_1 = 4282316

Results:

relationship_

id

concept_

id

concept_

name

domain_

id

vocabulary_

id

concept_

class_id

standard_

concept

concept_

code

valid_

start_

date

valid_

end_

date

invalid_

reason

Mapped from

44831089

Major depressive affective disorder, recurrent…

Condition

ICD9CM

5-dig billing code

None

296.30

1970-01-01

2099-12-31

None

Mapped from

1568218

Major depressive disorder, recurrent

Condition

ICD10CM

3-dig nonbill code

None

F33

2012-01-01

2099-12-31

None

Is a

4152280

Major depressive disorder

Condition

SNOMED

Clinical Finding

S

370143000

1970-01-01

2099-12-31

None

Is a

4098302

Recurrent depression

Condition

SNOMED

Clinical Finding

S

191616006

1970-01-01

2099-12-31

None

Subsumes

36714998

Moderately severe recurrent major depression

Condition

SNOMED

Clinical Finding

S

720452006

2017-01-31

2099-12-31

None

Subsumes

4094358

Chronic recurrent major depressive disorder

Condition

SNOMED

Clinical Finding

S

2618002

1970-01-01

2099-12-31

None

Note: This table is only a portion of the resultant table from the SQL query above.

You can see the relationship each concept has in the relationship_id column.

relationsip_id Nature of Relationship
Mapped from Source vocabulary
Is a Ancestor concept
Subsumes Descendant concept

Using the major recurrent depression example, you can see how all the concepts related to major recurrent depression connect together.

Infographic displaying the concepts and concept relationships for the concept  recurrent major depression, including how the concept recurrent major depression can have multiple descendant concepts which are listed as “subsumes” in the relationship_id field of the concept_relationship table and how the concept recurrent major depression can have multiple ancestor concepts which are listed as “is a” in the relationship_id field of the concept_relationship table.

Next articles

Data Dictionaries

Explore all the metadata and data tables used to populate the datasets in the Researcher Workbench

Using the All of Us Researcher Workbench

Learn about the tools and support resources available in the Researcher Workbench.

Was this article helpful?

19 out of 20 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.