How to find participant enrollment data

  • Updated

As a researcher, you may be interested in the date someone enrolled into the All of Us Research Program as a data contributing participant. The best data point to use is the observation_date when they granted primary consent to the program. NOTE: primary consent is different from Electronic Health Record (EHR) consent which can be granted at a later date. Using primary_consent_date as a proxy for enrollment date, the below SQL will pull this date for all participants. You can then merge this dataframe with others using person_id as the shared variable. 

Using R

library(bigrquery)
library(tidyverse)

download_data <- function(query) {
tb <- bq_project_query(Sys.getenv('GOOGLE_PROJECT'), query = query, default_dataset = Sys.getenv('WORKSPACE_CDR'))
bq_table_download(tb)
}


df = download_data(str_glue("SELECT DISTINCT
person_id,
MIN(observation_date) AS primary_consent_date
FROM `concept`
JOIN `concept_ancestor` on concept_id = ancestor_concept_id
JOIN `observation` on descendant_concept_id = observation_source_concept_id
WHERE concept_name = 'Consent PII' AND concept_class_id = 'Module'
GROUP BY 1"))
head(df)

 

Using Python

import pandas as pd
DATASET = %env WORKSPACE_CDR
df = pd.read_gbq(f'''
SELECT DISTINCT person_id, MIN(observation_date) AS primary_consent_date
FROM `{DATASET}.concept`
JOIN `{DATASET}.concept_ancestor` on concept_id = ancestor_concept_id
JOIN `{DATASET}.observation` on descendant_concept_id = observation_source_concept_id
WHERE concept_name = 'Consent PII' AND concept_class_id = 'Module'
GROUP BY 1''')
df.head()

 

 

 

Was this article helpful?

4 out of 4 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.