Participant death is recorded in Electronic Health Records (EHRs) and from HealthPro reporting. This information is now stored in the aou_death table. Both the Registered and Controlled Tiers contain death data, however the Registered Tier has a date shift and the cause is suppressed. See the table below for a summary and review the How All of Us protects participant privacy and Data Dictionary support articles for more detail.
All death records are now provided in aou_death table. Deceased status information is now available from 2 sources: EHR (src_id = EHR sites) and HealthPro, a program portal for collecting participant data by program staff (src_id = Staff Portal : HealthPro). Deceased status information has historically been sourced from EHR (if available). In September 2020, All of Us Research Program launched deceased status reporting in HealthPro. Starting in the CDRv8, this additional source of participant deceased status is made available to researchers. Currently, program reported cause of death from HealthPro is NOT provided in the CDR as it is collected as free text.
Death | Registered Tier | Controlled Tier |
Date | Random Shift - backwards between 1 & 365 | As Collected |
Cause | Suppressed | As Collected |
To extract death data you have two options, outlined below:
1. Utilize the Dataset Builder Tool:
Note: The Cohort Builder and Dataset Builder Tools utilize the death table, so the information is limited to 1 record per participant.
This method is recommended for those unfamiliar with SQL and are using the Controlled Tier dataset. Within the tool you can search the keyword 'death' and find all related EHR concepts, including but not limited to, 'Death of unknown cause' 'Death in hospital' etc. The benefit of this method is it allows you to be specific in the types of death. However, this method is not an efficient way to collect death in the Registered Tier because of field suppressions. See this article for details on how to use the Dataset Builder Tool.
2. Query the Death table using SQL:
This is the recommended method and will extract all death data most efficiently. Within a desired notebook, you will write a SQL query extracting the Death table. Within the SQL you will choose which fields you would like to include. See the lists below for available fields based off access tier:
Registered Tier - aou_death_id, person_id, death_date, death_datetime, death_type_concept_id, cause_concept_id, cause_source_value, src_id, primary_death_record.
Controlled Tier - aou_death_id, person_id, death_date, death_datetime, death_type_concept_id, cause_concept_id, cause_source_value, src_id, primary_death_record.
Example of using Python to query the aou_death table:
## Python
import pandas as pd
dataset = %env WORKSPACE_CDR
death_data = pd.read_gbq(f'''SELECT distinct person_id, death_date, death_type_concept_id, cause_concept_id
, primary_death_record, src_id
FROM `{dataset}.aou_death` ''')
death_data.head()
Example of using R to query the aou_death table:
## R
library(bigrquery)
library(tidyverse)
download_data <- function(query){
tb <- bq_project_query(Sys.getenv('GOOGLE_PROJECT'), query = str_glue(query))
bq_table_download(tb,bigint = "integer64")
}
dataset <- Sys.getenv('WORKSPACE_CDR')
death_data = download_data("SELECT distinct person_id, death_date
, death_type_concept_id, cause_concept_id, primary_death_record, src_id
FROM `{dataset}.aou_death`")
head(death_data)
Example of using Python to combine the aou_death and tables:
## Python
import pandas as pd
dataset = %env WORKSPACE_CDR
all_death_data= pd.read_gbq(f'''
WITH combined_death_data as
(SELECT distinct person_id, death_date, death_type_concept_id, cause_concept_id, cast(primary_death_record as string) as primary_death_record
FROM `{dataset}.aou_death`
UNION DISTINCT
SELECT distinct person_id, death_date, death_type_concept_id, cause_concept_id, 'n/a' as primary_death_record
FROM `{dataset}.death`
)
SELECT person_id, death_date, t.concept_name as death_type, c.concept_name as death_cause, primary_death_record
from combined_death_data
LEFT JOIN `{dataset}.concept` t on t.concept_id = death_type_concept_id
LEFT JOIN `{dataset}.concept` c on c.concept_id = cause_concept_id ''')
all_death_data.head()
Example of using R to combine the aou_death and tables:
## R
library(bigrquery)
library(tidyverse)
download_data <- function(query){
tb <- bq_project_query(Sys.getenv('GOOGLE_PROJECT'), query = str_glue(query))
bq_table_download(tb,bigint = "integer64")
}
dataset <- Sys.getenv('WORKSPACE_CDR')
all_death_data = download_data("
WITH combined_death_data as
(SELECT distinct person_id, death_date, death_type_concept_id, cause_concept_id, cast(primary_death_record as string) as primary_death_record
FROM `{dataset}.aou_death`
UNION DISTINCT
SELECT distinct person_id, death_date, death_type_concept_id, cause_concept_id, 'n/a' as primary_death_record
FROM `{dataset}.death`
)
SELECT person_id, death_date, t.concept_name as death_type, c.concept_name as death_cause, primary_death_record
FROM combined_death_data
LEFT JOIN `{dataset}.concept` t on t.concept_id = death_type_concept_id
LEFT JOIN `{dataset}.concept` c on c.concept_id = cause_concept_id
")
head(all_death_data)
Comments
0 comments
Article is closed for comments.