Within the All of Us Research Program, participants have an opportunity to share their electronic health records (EHR) for research. The participant’s affiliated health care organization may send the data, or participants may have the program to collect it directly through the All of Us Participant Portal using API applications following their agreement to share this information.
Inclusion Criteria
Participants are eligible to share their EHRs following consent to join All of Us. Participants who agree to share sign a separate EHR consent form. See information about the All of Us consent process for more background. EHR is collected only for participants who sign the EHR consent.
Data Collection and Transformation
EHR data types available in the curated data repository (CDR) include demographics, visits, diagnoses, procedures, medications, laboratory tests, and vital signs. The EHR data collected by the program is submitted by healthcare provider organizations or directly shared by participants through their participant portals. The data are transformed into the OMOP common data model structure. Recorded data elements are coded as source concepts (ex. ICD 9, ICD 10) and mapped to standard concepts (ex. SNOMED) based on international health standards. OMOP categorizes clinical data concepts according to several major health domains.
During the data transformation process, a series of steps are completed to ensure data conforms to both OMOP and All of Us standards. Details can be referenced in the CDR data dictionary release notes. Steps include:
- Validation checks that occur before and after data are transferred to the All of Us Data and Research Center (DRC) for CDR inclusion. The goal is to ensure that data conforms to OMOP naming conventions as well as data type and column structures.
- Characterization checks that occur after data are transferred to the DRC and are completed before and after CDR inclusion. The goal is to use OMOP-supported tools like the Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (Achilles) to assess person-level data characterization and density.
- Select tables, fields, and concepts are suppressed (e.g., removed). The goal is to ensure that data conforms to the All of Us privacy protection principles and rules.
- Select fields within the Measurement table undergo a small amount of cleaning. The goal is to help support data plausibility and utility for select data of high priority and research interest.
Below is summary information about EHR data storage and structure to help with getting started. Researchers who haven’t before worked with EHR data structured according to OMOP should learn about the model by browsing the OMOP website. The CDR data dictionaries should also be referenced for more details about CDR tables, fields, and privacy applications. Table and field availability as well as privacy protections are subject to change between CDR releases. Researchers should refer to the data dictionary materials relevant to the CDR version associated with their workspace project.
OMOP Domain Tables for Categorizing EHR Data | ||
Domain | Type of Information | Integration with other EHR Tables |
Condition Occurrence | Acute or chronic conditions that were diagnosed during interactions with healthcare providers. Allows temporal representation of conditions (e.g., start and end dates). Data is often mapped to source vocabularies such as ICD-9 and ICD10 and standard vocabularies such as SNOMED. | Project design may require linking data from this table to other tables such as the Visit Occurrence table (to associate conditions with specific healthcare visits). |
Observation | Range of observed findings related to health care examinations, assessments, or procedures. Includes any clinical observation or finding that can not be represented in other domain tables (e.g., symptoms, social, lifestyle, medical history observations). Data are often mapped to source vocabularies such as ICD-9 and ICD10 and standard vocabularies such as SNOMED and LOINC. | Project design may require linking data from this table to other tables such as the Visit Occurrence table (to associate observations with specific care visits), and Measurement table (for measurement values. |
Drug Exposure | Medications and other drug exposures received during health care encounters. Prescription medications, over-the-counter drugs, vaccines, and other therapeutic substances are included. Data are often mapped to source vocabularies such as the FDA’s National Drug Code (NDC) and mapped to standard vocabularies such as RxNorm. |
Project design may require linking data from this table to other tables such as the Visit Occurrence table (to associate drug exposures with specific healthcare visits), and Condition table (to link drug exposures with specific medical conditions).
|
Procedure | Medical procedures and interventions including surgical procedures, diagnostic tests, therapeutic interventions, etc. Data are often mapped to source vocabularies such as ICD-9 and standard vocabularies such as CPT. |
Project design may require linking data from this table to other tables such as the Visit Occurrence table (to associate procedures with specific healthcare visits), and the Condition table (to link procedures with specific medical conditions).
|
Measurement | Structured measure or value of a lab test or assessment. Includes physical measurements, blood assay measurements, vital signs, etc. and values are structured either numerically or categorically depending on the type. The Measurement and Observation tables sometimes contain categorical data that overlap. In those cases, the data differ in that the Measurement table stores quantitative or qualitative results from standardized tests and the Observation table stores an observed factual determination rather than a result. | Project design may require linking data from this table to other tables such as the Observation table to associate lab values with other observed findings. |
In addition to the above domain tables, OMOP stores clinical data and accompanying metadata throughout other tables. Some tables are suppressed in the Registered and/or Controlled Tier CDRs to protect privacy. Similarly, some concepts within the available tables are suppressed. Other available tables include the Person, Condition Occurrence, Visit Occurrence, Visit Detail, Procedure Occurrence, Observation Period, and Death tables.
The condition, observation, procedure, drug exposure, measurement, and visit occurrence tables are commonly used. When creating projects in the Researcher Workbench, researchers can use the Cohort and Dataset Builder tools to quickly select project inclusion and exclusion criteria and data concepts of interest from the Condition, Observation, Procedure, Drug Exposure, and Measurement table. Researchers interested in accompanying those clinical data with additional visit details from the Visit Occurrence table will need to extract those from the CDR directly once the dataset has been imported into an analysis application given the expansive size and collection of information stored there.
The table below includes a subset of commonly used and asked about fields that are indexed within each table. Again, for more extensive details about OMOP tables and extensions, fields, and privacy application, refer to the CDR data dictionaries. Note: Selection of extension tables have been introduced at different CDR version releases. For example, the OMOP survey conduct table was first populated and introduced with CDR version 7 and the Observation Period, Drug Era, and Condition Era tables were first populated and introduced with CDR version 8.
Examples of OMOP Table Fields for Categorizing EHR Data | |
Condition Occurrence Table | |
Field Name | Description |
condition_occurence_id | A unique identifier for each condition occurrence event. |
person_id | An identifier linking the condition occurrence event to the person. |
condition_concept_id | A standardized concept identifier representing the clinical condition in a standardized vocabulary (e.g., SNOMED-CT, ICD-10). |
condition_start_date | The date when the instance of the condition was first recorded. |
condition_end_date | The date when the condition is considered to have resolved. |
condition_type_concept_id | Indicates the type of occurrence (e.g., primary diagnosis, secondary diagnosis, etc) and reflects the source data from which the occurrence was recorded. |
Observation Table | |
Field Name | Description |
observation_id | A unique identifier for each observation. |
person_id | An identifier linking the observation to the person. |
observation_concept_id | A standardized concept identifier representing the clinical observation in a standardized vocabulary (e.g., LOINC for laboratory results, SNOMED-CT for clinical findings). |
observation_date | The date when the observation was recorded. |
observation_type_concept_id | Indicates the type of observation (e.g., laboratory test result, clinical finding, vital sign). |
value_as_number, value_as_string, value_as_concept_id | Depending on the nature of the observation, either of these fields will be populated. Data will represent a numeric result (value_as_number), text (value_as_string), or coded value (value_as_concept_id). |
observation_source_value | The original source value as recorded in the source system (e.g., specific laboratory test code) |
Drug Exposure | |
Field Name | Description |
drug_exposure_id | A unique identifier for each drug exposure event. |
person_id | An identifier linking the drug exposure to the person. |
drug_concept_id | A standardized concept identifier representing the specific drug or substance in a standardized vocabulary (e.g., RxNorm, SNOMED-CT). |
drug_exposure_start_date | The start date of the drug exposure period such as the start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded. |
drug_exposure_end_date | The date of the drug exposure period. This field may not be populated as data is not available from all sources. |
drug_type_concept_id | Indicates the type of drug exposure (e.g., prescription dispensed, prescription written, over-the-counter purchase, vaccine administered). |
quantity | The quantity of the prescribed drug administered or dispensed (e.g., number of pills, dosage strength) according to the original prescription or dispensing record. |
days_supply | The number of days the prescribed drug supply is intended (applicable for prescriptions) according to the original prescription or dispensing record. |
Procedure Occurrence Table | |
Field Name | Description |
procedure_occurrence_id | A unique identifier for each procedure occurrence record. |
person_id | An identifier linking the procedure occurrence to the person. |
procedure_concept_id | A standardized concept identifier representing the specific procedure or intervention in a standardized vocabulary (e.g., SNOMED-CT, CPT-4). |
procedure_date | The date when the procedure was performed. |
procedure_type_concept_id | Indicates the type of procedure (e.g., surgical procedure, diagnostic procedure, therapeutic procedure). |
modifier_concept_id | Additional information about the procedure, such as technique (ex. bilateral). |
Visit Occurrence Table | |
Field Name | Description |
visit_occurrence_id | A unique identifier for each visit occurrence encounter. |
person_id | An identifier linking the visit occurrence to the person. |
visit_concept_id | A standardized concept identifier representing the type of visit (e.g., outpatient visit, inpatient admission, emergency department visit) in a standardized vocabulary (e.g., SNOMED-CT, CPT-4). |
visit_start_date | The start date for the visit. |
visit_end_date | The end date for the visit. The end date will match the start date if it was a one day visit. |
visit_type_concept_id | Indicates the type of visit (e.g., ambulatory visit, inpatient visit, emergency visit). |
Measurement Table | |
measurement_id | A unique identifier for each measurement. |
person_id | An identifier linking the measurement to the person |
measurement_concept_id | A standardized concept identifier representing the type of measurement in a standardized vocabulary (e.g., LOINC). |
measurement_date | The date of the measurement. |
measurement_type_concept_id | Additional information about the measurement type. |
Value_as_number value_as_concept_id |
Depending on the nature of the measurement, either of these fields may be populated or omitted. Data will represent a numeric result (value_as_number) or coded value (value_as_concept_id). |
Unit_concept_id Range_low range_high |
If the measurement is for a lab test, these fields will be populated. The fields indicate the standard unit type in a standardized vocabulary (eg., UCUM) as well as the lower limit (range_low) and higher limit (range_high) of the normal range of the Measurement result. The lower and higher limit range is assumed to be of the same unit of measure as the Measurement value. |
In addition to privacy protection applications, a small amount of programmatic cleaning is applied to select fields within the Measurement table.
Cleaning Rules Applied to Table Fields for EHR Data | ||
Application | Affected Field | Resulting Output |
Standardize select, high priority lab and vital units and values | Unit_concept_id; value_as_number | Cortisol mass by volume measurements (measurement_concept_id = 3009905) submitted in thousands per cubic millimeter (unit_concept_id = 8961), the unit concept id for all measurements with this unit are set to thousands per microliter (unit_concept_id = 8848) and the value is multiplied by 1 |
Normalize units for height, then apply a cleaning algorithm to ensure rows seem reasonable based on history | Unit_concept_id; value_as_number | Height units normalized and checked for plausibility given history. If height is: - between 0.9 and 2.3, it is assumed to be entered in meters, and is converted to cm accordingly. - between 3 and 7.5, it is assumed to be entered in feet and is converted to cm accordingly - between 36 and 89.9, it is assumed to be entered in inches and is converted to cm accordingly. If height is recorded as 80 cm at one time, and 120 cm at another for the same person, then there is a disagreement between the two measures, with a standard deviation 28 ( which is > 10), so both of these records are dropped. |
Remove values that are either implausible or have no utility | Value as number and all other applicable fields |
Value_as_number field is nulled for "9999999" values.
Rows with measurement_concept_id = 0 dropped |
Considerations for Working with EHR Data
EHRs contain data that span the participant’s entire course of care within the affiliated health care organization system. This means that the volume and expansiveness of available data will vary by participant and recorded event. In some cases, data may be recorded for as far back as early childhood and, in others, it may only span a few years or less. In addition, these data are made available for research use, but their primary purpose was for clinical care and so, although a common data model helps to standardize the data elements, researchers should expect that individual records will differ in completeness and correctness. There may also be instances where self-reported participant responses from surveys are prioritized for mapping in the OMOP Person table over EHR, such as this example. For answers to frequently asked questions about All of Us EHR data, see the FAQ page here.
Comments
0 comments
Article is closed for comments.