A widespread measure of multicollinearity is the VIF (short for variance inflation factor). Multicollinearity describes the situation when the predictor variables in a multiple regression model are highly correlated, which is usually not desirable (assuming you haven’t gone Bayesian yet).

In R, the VIF can easily be calculated with a function in library *car*. It’s actually not difficult to do it by hand — which incidentally helps understand what we measure with the VIF, or why there is no different VIF for logistic regression models, or why the VIF is better than looking at bivariate correlations between predictors.

We start with some random data to run the multiple regression model. Here we create one outcome (y) and three predictor variables (x, z, a), full of random numbers. That’ll do for a demonstration.

x = runif(50)

y = runif(50)

z = runif(50)

a = runif(50)

Here’s a simple OLS model:

m = lm(y ~ x + z + a)

If you have library car installed, you can easily calculate the VIF:

library(car) vif(m)

To do it by hand, though, we run a linear regression model (OLS) for each of the predictors. Here’s the code for predictor x. One of the predictors becomes the outcome variable (here x), and the other predictors remain predictors. The variable used as the outcome previously (y) does not appear here.

mx = lm(x ~ z + a)

The VIF is simply: 1/(1-R²) of this model. In R, we can run the following:

1/(1-summary(mx)$r.squared)