Introducing the CDCPLACES Package

A brief vignette demonstrating the use of the package CDCPLACES.
R
packages
API
vignette
Author

Brenden Smith

Published

January 10, 2024

This post was updated on March 19, 2024 to reflect updates introduced in CDCPLACES 1.1.5.

Introduction

To begin, we can install from CRAN, or from github, then load our packages.

Code
# Install from CRAN
# install.packages("CDCPLACES)

# Install from Github
# devtools::install_github("brendensm/CDCPLACES")

library(CDCPLACES)
library(dplyr)
library(ggplot2)

Function: get_dictionary

Our first functions allows us to easily view what measures we can query, via ‘measureid’, along with a brief definition of each function. If we run get_dictionary, a data frame is returned. We can view the measures in a data frame in the R Studio with View(). This is the preferred method for exploring the available measures.

For our example here, I will print the names of the variables in this dataframe.

Code
# To open a viewer
# get_dictionary() %>% View()

get_dictionary() %>% names()
 [1] "measureid"                "measure_full_name"       
 [3] "measure_short_name"       "categoryid"              
 [5] "category_name"            "places_release_2023"     
 [7] "places_release_2022"      "places_release_2021"     
 [9] "places_release_2020"      "_500_cities_release_2019"
[11] "_500_cities_release_2018" "_500_cities_release_2017"
[13] "_500_cities_release_2016" "frequency_brfss_year"    

This data frame is useful for several reasons. It lists the available measures for each year of the CDC PLACES data, along with the data each variable was collected, all in a single place. Remember to use the measureid when querying your data.

Function: get_places

This function allows us to easily query data that we specify. In the example below, I will get the measure ACCESS2 (the current lack of health insurance among adults aged 18-64 years) for the state of Arizona. This function allows for multiple of these arguments.

Code
az_access <- get_places(state = "AZ", 
                        measure = "ACCESS2") 
head(az_access)
# A tibble: 6 × 21
  year  stateabbr statedesc locationname datasource category   measure          
  <chr> <chr>     <chr>     <chr>        <chr>      <chr>      <chr>            
1 2021  AZ        Arizona   Yuma         BRFSS      Prevention Current lack of …
2 2021  AZ        Arizona   Graham       BRFSS      Prevention Current lack of …
3 2021  AZ        Arizona   Apache       BRFSS      Prevention Current lack of …
4 2021  AZ        Arizona   La Paz       BRFSS      Prevention Current lack of …
5 2021  AZ        Arizona   Coconino     BRFSS      Prevention Current lack of …
6 2021  AZ        Arizona   Cochise      BRFSS      Prevention Current lack of …
# ℹ 14 more variables: data_value_unit <chr>, data_value_type <chr>,
#   data_value <dbl>, low_confidence_limit <dbl>, high_confidence_limit <dbl>,
#   totalpopulation <chr>, locationid <chr>, categoryid <chr>, measureid <chr>,
#   datavaluetypeid <chr>, short_question_text <chr>, type <chr>, lon <dbl>,
#   lat <dbl>

It is also worth noting that by default geography specifying geography is set to “county”. If instead we want to examine census tracts, we could specify the argument. Likewise, release is set to “2023” by default.

The argument county can be used to filter results to specific counties. This is extremely useful for examining census level data for specific areas of states. Additionally, geometry can be added to include a shapefile in the query. For further examples of plotting with shapefiles, see this dedicated blog post.

Code
cap_counties <- get_places(geography = "census",
                           state = "MI",
                           measure = "ACCESS2",
                           county = c("Ingham", "Eaton", "Clinton"),
                           geometry = TRUE)

Use Case

From here, we can start to have fun. It is fairly straight forward to begin exploring data. Here I will first filter out the data so that I can plot the age adjusted rates of lack of health insurance in Arizona.

Notice that the data provide you with confidence limits, so I have chosen to plot them here with error bars.

Code
az_access %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  ggplot(aes(data_value, reorder(locationname, data_value))) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) +
  labs(title = "Lack of health insurance among adults aged 18-64 years In Arizona Counties",
       y = "", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

You can also extend this to multiple states to compare. You can easily query two (or more) state names, and plot them. Arizona seems to have a couple of counties that have a much higher rate compared to others.

Code
# multi state comparison
two <- get_places(state = c("AZ", "NV"), 
                  measure = "ACCESS2")

two %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  ggplot(aes(data_value, reorder(locationname, data_value), color = stateabbr)) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) +
  labs(title = 
         "Lack of health insurance among adults aged 18-64 years In Arizona and Nevada",
       y = "Counties", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

We can go even further by comparing more states in the region. Here I have taken the average rate by state to easily compare. Texas appears to be far above the average.

Code
multi <- get_places(state = c("AZ", "NV", "NM", "TX", "CA"), measure = "ACCESS2") %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  summarise(.by = "stateabbr", mean_val = mean(data_value), mean_low = mean(low_confidence_limit), mean_high = mean(high_confidence_limit))

multi %>%
  ggplot(aes(mean_val, reorder(stateabbr, mean_val), color = stateabbr)) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = mean_low, xmax = mean_high)) +
  labs(title = "Mean lack of health insurance among adults aged 18-64 years In Southwest States",
       y = "", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")