Brenden Smith - Introducing the CDCPLACES Package

This is part of the CDCPLACES blog series. To view the other posts in this series click here.

This post was updated on March 19, 2024 to reflect updates introduced in CDCPLACES 1.1.5.

Introduction

To begin, we can install from CRAN, or from github, then load our packages.

Code

# Install from CRAN
# install.packages("CDCPLACES)

# Install from Github
# devtools::install_github("brendensm/CDCPLACES")

library(CDCPLACES)
library(dplyr)
library(ggplot2)

Function: `get_dictionary`

Our first functions allows us to easily view what measures we can query, via ‘measureid’, along with a brief definition of each function. If we run get_dictionary, a data frame is returned. We can view the measures in a data frame in the R Studio with View(). This is the preferred method for exploring the available measures.

For our example here, I will print the names of the variables in this dataframe.

Code

# To open a viewer
# get_dictionary() %>% View()

get_dictionary() %>% names()

 [1] "measureid"                "measure_full_name"       
 [3] "measure_short_name"       "categoryid"              
 [5] "category_name"            "places_release_2024"     
 [7] "measurename16_23"         "places_release_2023"     
 [9] "places_release_2022"      "places_release_2021"     
[11] "places_release_2020"      "_500_cities_release_2019"
[13] "_500_cities_release_2018" "_500_cities_release_2017"
[15] "_500_cities_release_2016" "frequency_brfss_year"    
[17] "shortname16_23"

This data frame is useful for several reasons. It lists the available measures for each year of the CDC PLACES data, along with the data each variable was collected, all in a single place. Remember to use the measureid when querying your data.

Function: `get_places`

This function allows us to easily query data that we specify. In the example below, I will get the measure ACCESS2 (the current lack of health insurance among adults aged 18-64 years) for the state of Arizona. This function allows for multiple of these arguments.

Code

az_access <- get_places(state = "AZ", 
                        measure = "ACCESS2") 
head(az_access)

  year stateabbr statedesc locationname datasource   category
1 2022        AZ   Arizona         Yuma      BRFSS Prevention
2 2022        AZ   Arizona       La Paz      BRFSS Prevention
3 2022        AZ   Arizona      Yavapai      BRFSS Prevention
4 2022        AZ   Arizona      Yavapai      BRFSS Prevention
5 2022        AZ   Arizona       Graham      BRFSS Prevention
6 2022        AZ   Arizona       La Paz      BRFSS Prevention
                                                         measure
1 Current lack of health insurance among adults aged 18-64 years
2 Current lack of health insurance among adults aged 18-64 years
3 Current lack of health insurance among adults aged 18-64 years
4 Current lack of health insurance among adults aged 18-64 years
5 Current lack of health insurance among adults aged 18-64 years
6 Current lack of health insurance among adults aged 18-64 years
  data_value_unit         data_value_type data_value low_confidence_limit
1               %        Crude prevalence       15.6                 14.2
2               %        Crude prevalence        9.2                  8.3
3               % Age-adjusted prevalence       10.9                  9.8
4               %        Crude prevalence        6.5                  5.7
5               % Age-adjusted prevalence       16.2                 14.4
6               % Age-adjusted prevalence       20.4                 18.5
  high_confidence_limit totalpopulation totalpop18plus locationid categoryid
1                  17.1          207842         155973      04027    PREVENT
2                  10.2           16506          13903      04012    PREVENT
3                  12.2          246191         208516      04025    PREVENT
4                   7.2          246191         208516      04025    PREVENT
5                  18.0           38779          28375      04009    PREVENT
6                  22.3           16506          13903      04012    PREVENT
  measureid datavaluetypeid short_question_text
1   ACCESS2          CrdPrv    Health Insurance
2   ACCESS2          CrdPrv    Health Insurance
3   ACCESS2       AgeAdjPrv    Health Insurance
4   ACCESS2          CrdPrv    Health Insurance
5   ACCESS2       AgeAdjPrv    Health Insurance
6   ACCESS2       AgeAdjPrv    Health Insurance
                                 geolocation
1 Point, -113.905997385394, 32.7694427722845
2  Point, -113.98153747073, 33.7292571142703
3 Point, -112.553640853115, 34.5996977529402
4 Point, -112.553640853115, 34.5996977529402
5 Point, -109.887402691851, 32.9327553402921
6  Point, -113.98153747073, 33.7292571142703

It is also worth noting that by default geography specifying geography is set to “county”. If instead we want to examine census tracts, we could specify the argument. Likewise, release is set to “2023” by default.

The argument county can be used to filter results to specific counties. This is extremely useful for examining census level data for specific areas of states. Additionally, geometry can be added to include a shapefile in the query. For further examples of plotting with shapefiles, see this dedicated blog post.

Code

cap_counties <- get_places(geography = "census",
                           state = "MI",
                           measure = "ACCESS2",
                           county = c("Ingham", "Eaton", "Clinton"),
                           geometry = TRUE)

Use Case

From here, we can start to have fun. It is fairly straight forward to begin exploring data. Here I will first filter out the data so that I can plot the age adjusted rates of lack of health insurance in Arizona.

Notice that the data provide you with confidence limits, so I have chosen to plot them here with error bars.

Code

az_access %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  ggplot(aes(data_value, reorder(locationname, data_value))) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) +
  labs(title = "Lack of health insurance among adults aged 18-64 years In Arizona Counties",
       y = "", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

You can also extend this to multiple states to compare. You can easily query two (or more) state names, and plot them. Arizona seems to have a couple of counties that have a much higher rate compared to others.

Code

# multi state comparison
two <- get_places(state = c("AZ", "NV"), 
                  measure = "ACCESS2")

two %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  ggplot(aes(data_value, reorder(locationname, data_value), color = stateabbr)) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) +
  labs(title = 
         "Lack of health insurance among adults aged 18-64 years In Arizona and Nevada",
       y = "Counties", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

We can go even further by comparing more states in the region. Here I have taken the average rate by state to easily compare. Texas appears to be far above the average.

Code

multi <- get_places(state = c("AZ", "NV", "NM", "TX", "CA"), measure = "ACCESS2") %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  summarise(.by = "stateabbr", mean_val = mean(data_value), mean_low = mean(low_confidence_limit), mean_high = mean(high_confidence_limit))

multi %>%
  ggplot(aes(mean_val, reorder(stateabbr, mean_val), color = stateabbr)) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = mean_low, xmax = mean_high)) +
  labs(title = "Mean lack of health insurance among adults aged 18-64 years In Southwest States",
       y = "", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

Introduction

Function: get_dictionary

Function: get_places

Use Case

Function: `get_dictionary`

Function: `get_places`