library(hm878)
Introduction
This vignette demonstrates how to use the functions included in this package so far. If you have not yet, install the package with the following code: devtools::install_github("brendensm/hm878")
. If you do not have the package devtools, be sure to install that first install.packages("devtools")
.
To start, we load the package
Let’s assume we are running a binomial logistic regression using the data from mtcars
, a built-in data set included with R. We will use vs
(engine type as V-shaped or straight) as the dependent variable, and cyl
(number of cylinders) as the independent variable. We will store our models for block 1 and block 2.
<- glm(vs ~ 1, data = mtcars, family = "binomial")
mb1 <- glm(vs ~ cyl, data = mtcars, family = "binomial") mb2
Testing Goodness of Fit with chi_log
To assess the fit of our models, we may want to use the function chi_log
. To use it, simply type in the name of your model as the first argument, followed by the data set that the model uses. Optionally, you can provide labels for each model using the third argument. Here I will label each block.
chi_log(mb1, mtcars, "Block 1")
Pearson Goodness of Fit Test
Null Hypothesis: The model fits.
Alternative Hypothesis: The model does not fit.
Pearson chi-squared for Block 1: 32
Degrees of freedom for Block 1: 31
P-value for Block 1: 0.416744
---
Failed to reject Null Hypothesis. The model fits.
---
chi_log(mb2, mtcars, "Block 2")
Pearson Goodness of Fit Test
Null Hypothesis: The model fits.
Alternative Hypothesis: The model does not fit.
Pearson chi-squared for Block 2: 27.42
Degrees of freedom for Block 2: 30
P-value for Block 2: 0.6013516
---
Failed to reject Null Hypothesis. The model fits.
---
The function gives us the chi-squared statistic, degrees of freedom, and a p-value. It also reminds us of the null and alternative hypotheses. Both models appear to be a good fit.
Accuracy Percentage with predict_percent
We may want to also check the accuracy of our models. To do this, we can use predict_percent
. To use this function, enter the name of the model in the first argument, followed by the dependent variable we used in the model. For this, we must use the data$variable format. In the example below, we use the variable vs
from the data set mtcars
. Once again, we can label the output with a string as the optional third argument.
predict_percent(mb1, mtcars$vs, "Block 1")
Accuracy for Block 1: 56.25%
predict_percent(mb2, mtcars$vs, "Block 2")
Accuracy for Block 2: 84.38%
Calculating Odds Ratios with or
To calculate odds ratios for the models, simply pass the model through the function or
.
or(mb1)
Odds_Ratio CI_Lower CI_Upper p_values
(Intercept) 0.7777778 0.3801366 1.558936 0.4806496
or(mb2)
Odds_Ratio CI_Lower CI_Upper p_values
(Intercept) 10873.447296 95.35600799 6.507716e+07 0.002692584
cyl 0.204474 0.04827075 4.455527e-01 0.001917098
The output results in a data frame with the odds ratios, confidence intervals, and p-values.
Upper and Lower Fences with fences
If you want to revise and adjust your model, it can be helpful to limit outliers. To find upper and lower fences quickly, use the function fences
. To do this, pass the continuous variable you are interested in examining through the function. Once again, use the format data$variable.
#fences(iris$Sepal.Length)
#fences(mtcars$cyl)$Upper
#fences(mtcars$cyl)$Lower
Comparing Model Results with compare_models
Lastly, when you are putting together multiple models, it can be helpful to view them all at the same time, next to one another. This is particularly helpful if you have more than two models you are comparing. For this function, pass through however many models you have to compare, and optionally label each one, using a vector of strings for each model. To demonstrate, I will add on another model mb3
that will have another continuous independent variable.
<- glm(vs ~ cyl + wt, data = mtcars, family = "binomial")
mb3
compare_models(mb1, mb2, mb3, labels = c("Model 1 Block 1", "Model 1 Block 2", "Model Block 3"))
$`Model 1 Block 1`
Call: glm(formula = vs ~ 1, family = "binomial", data = mtcars)
Coefficients:
(Intercept)
-0.2513
Degrees of Freedom: 31 Total (i.e. Null); 31 Residual
Null Deviance: 43.86
Residual Deviance: 43.86 AIC: 45.86
$`Model 1 Block 2`
Call: glm(formula = vs ~ cyl, family = "binomial", data = mtcars)
Coefficients:
(Intercept) cyl
9.294 -1.587
Degrees of Freedom: 31 Total (i.e. Null); 30 Residual
Null Deviance: 43.86
Residual Deviance: 17.96 AIC: 21.96
$`Model Block 3`
Call: glm(formula = vs ~ cyl + wt, family = "binomial", data = mtcars)
Coefficients:
(Intercept) cyl wt
10.619 -2.931 2.100
Degrees of Freedom: 31 Total (i.e. Null); 29 Residual
Null Deviance: 43.86
Residual Deviance: 15.55 AIC: 21.55
deviance_aic
Pull the Deviances and AICs from model summarys
deviance_aic(mb1, mb2, mb3)
mb1
Residual Deviance: 43.86
Null Deviance: 43.86
AIC: 45.86
mb2
Residual Deviance: 17.96
Null Deviance: 43.86
AIC: 21.96
mb3
Residual Deviance: 15.55
Null Deviance: 43.86
AIC: 21.55