Sometimes we need to run a regression analysis on a subset or sub-sample. That’s quite simple to do in R. All we need is the subset command. Let’s look at a linear regression:
lm(y ~ x + z, data=myData)
Rather than run the regression on all of the data, let’s do it for only women, or only people with a certain characteristic:
lm(y ~ x + z, data=subset(myData, sex=="female"))
lm(y ~ x + z, data=subset(myData, age > 30))
The subset() command identifies the data set, and a condition how to identify the subset.
Published 3 December 2016