Calculating VIF by hand

A widespread measure of multicollinearity is the VIF (short for variance inflation factor). Multicollinearity describes the situation when the predictor variables in a multiple regression model are highly correlated, which is usually not desirable (assuming you haven’t gone Bayesian yet).

In R, the VIF can easily be calculated with a function in library car. It’s actually not difficult to do it by hand — which incidentally helps understand what we measure with the VIF, or why there is no different VIF for logistic regression models, or why the VIF is better than looking at bivariate correlations between predictors.

We start with some random data to run the multiple regression model. Here we create one outcome (y) and three predictor variables (x, z, a), full of random numbers. That’ll do for a demonstration.

x = runif(50) 
y = runif(50)
z = runif(50)
a = runif(50)

Here’s a simple OLS model:


m = lm(y ~ x + z + a)

If you have library car installed, you can easily calculate the VIF:

library(car)
vif(m)

To do it by hand, though, we run a linear regression model (OLS) for each of the predictors. Here’s the code for predictor x. One of the predictors becomes the outcome variable (here x), and the other predictors remain predictors. The variable used as the outcome previously (y) does not appear here.

mx = lm(x ~ z + a)

The VIF is simply: 1/(1-R²) of this model. In R, we can run the following:

1/(1-summary(mx)$r.squared)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.