SAS Analytics Guide - How to perform logistic regression

  • Updated

Data version: Registered Tier All of Us Curated Data Repository (CDR) v7 (R2022Q4R9)

Analysis tool: SAS Studio

Featured Workspace: SAS Demonstration: Diabetes mellitus medication prescription patterns (v7) (Researcher Workbench login required)


Bassent Abdelbary, Ph.D., MPH Blue circle with white envelope email icon.png
University of Texas Rio Grande Valley College of Health Professions


Please note: This guide aims to demonstrate the utilization of PROC LOGISTIC using SAS Studio. However, it is important to note that this guide is not comprehensive and does not encompass all facets of the scientific process which researchers are required to undertake. Specifically, it only does not delve into data cleaning and verification, assumption validation, model diagnostics, potential follow-up analyses, or any other possible approaches for performing a logistic regression with forward selection.

This analytics guide is available as a downloadable PDF.

Table of Contents


This guide includes examples of statistical analysis processes which were used as part of an exploratory study designed to understand differences in prescriptions of newer generation diabetes medications such as glucagon-like peptide 1 (GLP-1) agonists and sodium-glucose transport protein 2 (SGLT2) inhibitors by looking at patterns within the electronic health records (EHR) and survey data from the All of Us Research Program.

For this project, a cohort from the All of Us dataset was selected using all participants aged 18 to 75 years old with documented cases of type 2 diabetes in their EHR. Demographic data were combined with prescription data from EHR and select survey questions regarding health insurance type and trends in prescription refills.

  • Outcome variable: Prescription of newer generation diabetes medications such as GLP-1 agonists and SGLT-2 inhibitors (binary variable Yes/No).
  • Independent variable: Several independent variables were evaluated as a part of a stepwise method. For this demonstration guide, we will use health insurance (Yes/No) as our independent variable.
  • Confounding variable: For this guide, we will use race and gender as the confounding variable.

In this analysis, PROC LOGISTIC models the probability of no GLP-1/SGLT-2 initial prescription based on chronology of prescription history. The Wald test is used to display the chi2 and p-value when using PROC LOGISTIC. Parameter estimates and odds ratio could be displayed as well with the additional options listed below.

  1. The CLASS statement names the classification variables to be used as explanatory variables in the analysis.
  2. The EXPB option displays the Exp (Est) column containing the exponentiated parameter estimates.
  3. The SELECTION=FORWARD option is specified to carry out the forward selection.
  4. For the odds graphics, The ODDSRATIO statements compute the odds ratios for the covariates, and the NOOR option suppresses the default odds ratio table. CONTRAST statements provide another method of producing the odds ratios.

Description of the dataset

For this example, we will use the following criteria to define the cohort for this analysis.

  1. Created cohort as defined by the following measures.
    • Inclusion criteria:

      • Completed The Basics survey including questions about demographics
      • Adults (Participants aged 18-75 years old)
      • Had the following conditions and lab measurements:
        • Type 2 diabetes mellitus (diabetes mellitus without complications or type 2 diabetes mellitus without complications)
        • Lab measurements of an A1C >8
    • Exclusion criteria:

      • Had the following conditions:
        • Type 1 diabetes mellitus
        • Type 1 diabetes mellitus without complications
  2. Created a concept set by medication name to include all diabetic medications using medication concept ID. These medications were reconciled and combined into a new variable defining medication class based on mechanism of action/pharmaceutical class classification. Finally, a binary variable was created (Yes/No) based on the medication classes to reflect having an initial prescription of newer medications to be used as our outcome variable.
  3. Three tables were loaded using the cohort analyzer; demographics (386,404 Observations), medications (854,047 observations with repeated entries over time) and select survey questions. Data cleaning and re-coding were done to individual tables then the three tables were merged using the person_id variable (unique participant identifier).
  4. The final dataset had 386,404 participants where only 386,317 had completed health insurance status (only 176.906 had a health utilization questionnaire on file) and 44,960 had their EHR prescription data on file

Confounding variables used in this guide were re-coded to the following:

  • Race (White: 1, Black or African American: 2, Asian: 3, Other: 4, more than one race: 5)
  • Ethnicity (Hispanic or Latino: 1, non-Hispanic or Latino: 2, Other: 3)
  • Gender (Female: 1, Male: 2, Not man only, not woman only, prefer not to answer, or skipped: 3)

Statistical analysis procedures

Step 1: Describe the variables to be evaluated for further analysis.

Example code:

proc freq data=demosurvey;
table insurance race;

Example results:

step 1.png

step 1-2.png

Step 2: Using cross tabulation and Pearson’s Chi-square test to examine differences in distribution or interactions with our confounders.

Example code:

proc freq data=demosurvey;
table insurance*gender / chisq;

Example results:

step 2.png

step 2-2.png

Step 3: Recoding the outcome variable.

Example code:

data meds6;/*5=SGLT2/6=GLP1/3=ddp4 and 7 are combined new formula*/
if medsclass=5 then GLPST=1;
else if medsclass=6 then GLPST=1;
else if medsclass=3 then GLPST=1;
else if medsclass=7 then GLPST=1;
else GLPST=2;
proc freq data=meds6;
table GLPST;

This recoding was done in the original medications table with the repeated observation and was used later coupled with the person_ID to drop duplicates keeping the first entry to highlight those with initial or first prescription type.


Example results:

step 3.png

After dropping duplicate entries by person_ID and GLPST, we had 9,469 participants with newer initial prescription meds (21% of the total participants with EHR information on file).

Step 4: Use the LOGISTIC procedure to fit a two-way logit model for the effect of having insurance with race and gender as covariates.

Example code:

proc logistic data=demomedsurvey;/*running a demo logistic regression*/
class insurance (ref=“2”) race (ref=“1”) gender;
model GLPST= insurance race gender/ expb;

Example results:

In this analysis, PROC LOGISTIC models the probability of no newer generation prescription medication (GLPST=No).

step 4.png

step 4-2.png

step 4-3.png

step 4-4.png

step 4-5.png

step 4-6.png

Step 5: The following PROC LOGISTIC statements illustrate the use of forward selection to identify the effects that differentiate the two GLPST responses. The option SELECTION=FORWARD is specified to carry out the forward selection. Notice that we did not specify a reference to illustrate the effect coding.

Example code:

proc logistic data=demomedsurvey;/*running same model with forward selection*/
class insurance race gender;
model GLPST= insurance race gender/ selection=forward expb;

Example results:

step 5.png

step 5-2.png

step 5-3.png

Step 6: Finally, the following statements refit the previously selected model, except that reference coding is used for the CLASS variables instead of effect coding.

Example code:

ods graphics on;
proc logistic data=demomedsurvey plots(only maxpoints=none)=(oddsratio(range=clip));
     class insurance race gender /param=ref;
     model GLPST= insurance race gender Age / noor;
     oddsratio insurance;
     oddsratio race;
     contrast ‘insurance 1 vs 2’ insurance 1 2 / estimate=exp;
     contrast ‘race 1 vs 2’ race 1 - 2 / estimate=exp;
     contrast ‘race 1 vs 3’ race 1-3 / estimate=exp;
     contrast ‘race 1 vs 4’ race 1 -4 / estimate=exp;
     effectplot / at(gender=all) noobs;
     effectplot slicefit(sliceby=gender plotby=insurance / noobs;

Example results:

step 6.png

step 6-2.png

step 6-3.png

step 6-4.png

step 6-5.png

step 6-6.png

step 6-7.png

step 6-8.png

Additional resources

For additional information about using SAS Studio in the Researcher Workbench, explore the following articles: Exploring All of Us data using SAS Studio and How to run SAS in the Researcher Workbench.

You can also download this analytics guide as a PDF.

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request



Article is closed for comments.