How to run a regression on a subset in R

Sometimes we need to run a regression analysis on a subset or sub-sample. That’s quite simple to do in R. All we need is the subset command. Let’s look at a linear regression:

lm(y ~ x + z, data=myData)

Rather than run the regression on all of the data, let’s do it for only women, or only people with a certain characteristic:

lm(y ~ x + z, data=subset(myData, sex=="female"))

lm(y ~ x + z, data=subset(myData, age > 30))

The subset() command identifies the data set, and a condition how to identify the subset.

5 thoughts on "How to run a regression on a subset in R

  1. Hi,

    Is it possible to specify both “sex== female” and “age>30” at the same time? Is there a limit to how many specifications you can add?

    For example, if I wanted to include data from only one year (from the year column) and only females (from the gender column) and weight >50 (from the weight column) and regress weight on females born in the specified year.

    If this is possible, how would one write it?

