HM878: Helper Functions

A walkthrough of the helper functions in the package hm878
R
packages
vignette
Author

Brenden Smith

Published

October 12, 2023

Introduction

This vignette demonstrates how to use the functions included in this package so far. If you have not yet, install the package with the following code: devtools::install_github("brendensm/hm878"). If you do not have the package devtools, be sure to install that first install.packages("devtools").

To start, we load the package

library(hm878)

Let’s assume we are running a binomial logistic regression using the data from mtcars, a built-in data set included with R. We will use vs (engine type as V-shaped or straight) as the dependent variable, and cyl (number of cylinders) as the independent variable. We will store our models for block 1 and block 2.

mb1 <- glm(vs ~ 1, data = mtcars, family = "binomial")
mb2 <- glm(vs ~ cyl, data = mtcars, family = "binomial")

Testing Goodness of Fit with chi_log

To assess the fit of our models, we may want to use the function chi_log. To use it, simply type in the name of your model as the first argument, followed by the data set that the model uses. Optionally, you can provide labels for each model using the third argument. Here I will label each block.

chi_log(mb1, mtcars, "Block 1")
Pearson Goodness of Fit Test
 Null Hypothesis: The model fits.
 Alternative Hypothesis: The model does not fit.

 Pearson chi-squared for Block 1:  32 
 Degrees of freedom for Block 1:  31 
 P-value for Block 1:  0.416744 

 ---
 Failed to reject Null Hypothesis. The model fits.
 ---
chi_log(mb2, mtcars, "Block 2")
Pearson Goodness of Fit Test
 Null Hypothesis: The model fits.
 Alternative Hypothesis: The model does not fit.

 Pearson chi-squared for Block 2:  27.42 
 Degrees of freedom for Block 2:  30 
 P-value for Block 2:  0.6013516 

 ---
 Failed to reject Null Hypothesis. The model fits.
 ---

The function gives us the chi-squared statistic, degrees of freedom, and a p-value. It also reminds us of the null and alternative hypotheses. Both models appear to be a good fit.

Accuracy Percentage with predict_percent

We may want to also check the accuracy of our models. To do this, we can use predict_percent. To use this function, enter the name of the model in the first argument, followed by the dependent variable we used in the model. For this, we must use the data$variable format. In the example below, we use the variable vs from the data set mtcars. Once again, we can label the output with a string as the optional third argument.

predict_percent(mb1, mtcars$vs, "Block 1")

Accuracy for Block 1: 56.25%
predict_percent(mb2, mtcars$vs, "Block 2")

Accuracy for Block 2: 84.38%

Calculating Odds Ratios with or

To calculate odds ratios for the models, simply pass the model through the function or.

or(mb1)
            Odds_Ratio  CI_Lower CI_Upper  p_values
(Intercept)  0.7777778 0.3801366 1.558936 0.4806496
or(mb2)
              Odds_Ratio    CI_Lower     CI_Upper    p_values
(Intercept) 10873.447296 95.35600799 6.507716e+07 0.002692584
cyl             0.204474  0.04827075 4.455527e-01 0.001917098

The output results in a data frame with the odds ratios, confidence intervals, and p-values.

Upper and Lower Fences with fences

If you want to revise and adjust your model, it can be helpful to limit outliers. To find upper and lower fences quickly, use the function fences. To do this, pass the continuous variable you are interested in examining through the function. Once again, use the format data$variable.

#fences(iris$Sepal.Length)
#fences(mtcars$cyl)$Upper
#fences(mtcars$cyl)$Lower

Comparing Model Results with compare_models

Lastly, when you are putting together multiple models, it can be helpful to view them all at the same time, next to one another. This is particularly helpful if you have more than two models you are comparing. For this function, pass through however many models you have to compare, and optionally label each one, using a vector of strings for each model. To demonstrate, I will add on another model mb3 that will have another continuous independent variable.

mb3 <- glm(vs ~ cyl + wt, data = mtcars, family = "binomial")

compare_models(mb1, mb2, mb3, labels = c("Model 1 Block 1", "Model 1 Block 2", "Model Block 3"))
$`Model 1 Block 1`

Call:  glm(formula = vs ~ 1, family = "binomial", data = mtcars)

Coefficients:
(Intercept)  
    -0.2513  

Degrees of Freedom: 31 Total (i.e. Null);  31 Residual
Null Deviance:      43.86 
Residual Deviance: 43.86    AIC: 45.86

$`Model 1 Block 2`

Call:  glm(formula = vs ~ cyl, family = "binomial", data = mtcars)

Coefficients:
(Intercept)          cyl  
      9.294       -1.587  

Degrees of Freedom: 31 Total (i.e. Null);  30 Residual
Null Deviance:      43.86 
Residual Deviance: 17.96    AIC: 21.96

$`Model Block 3`

Call:  glm(formula = vs ~ cyl + wt, family = "binomial", data = mtcars)

Coefficients:
(Intercept)          cyl           wt  
     10.619       -2.931        2.100  

Degrees of Freedom: 31 Total (i.e. Null);  29 Residual
Null Deviance:      43.86 
Residual Deviance: 15.55    AIC: 21.55

deviance_aic Pull the Deviances and AICs from model summarys

deviance_aic(mb1, mb2, mb3)
mb1 
Residual Deviance: 43.86 
Null Deviance: 43.86 
AIC: 45.86 

mb2 
Residual Deviance: 17.96 
Null Deviance: 43.86 
AIC: 21.96 

mb3 
Residual Deviance: 15.55 
Null Deviance: 43.86 
AIC: 21.55