Why is there only a single coefficient for a categorical variable?

In this video, we solve the mystery of why there is sometimes only one coefficient for a categorical variable in the results of a multiple regression model. When things work as expected, there are 3 coefficients or lines in the results, for a variable with 4 categories (one is kept as the reference or base). If the computer treats a categorical variable as if it were a continuous one, we will only get one coefficient. The solution is to tell the computer that the variable is categorical; in R we could use as.factor(), or we could use strings rather than numbers.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.