How to run a regression on a subset in R

Sometimes we need to run a regression analysis on a subset or sub-sample. That’s quite simple to do in R. All we need is the subset command. Let’s look at a linear regression:

lm(y ~ x + z, data=myData)

Rather than run the regression on all of the data, let’s do it for only women, or only people with a certain characteristic:

lm(y ~ x + z, data=subset(myData, sex=="female"))

lm(y ~ x + z, data=subset(myData, age > 30))

The subset() command identifies the data set, and a condition how to identify the subset.

11 thoughts on “How to run a regression on a subset in R

  1. Hi,

    Is it possible to specify both “sex== female” and “age>30” at the same time? Is there a limit to how many specifications you can add?

    For example, if I wanted to include data from only one year (from the year column) and only females (from the gender column) and weight >50 (from the weight column) and regress weight on females born in the specified year.

    If this is possible, how would one write it?

  2. Thanks for your article.
    I do have a large data file with 67 countries. How can I run multiple regressions on different countries? Or how can I run multiple regressions on different periods? There are some long ways to follow in [R] and get the results I am looking for, but I would like to learn the shortcut. Thanks

  3. Thanks, Didier! That was really helpful. However, I am still struggling to combine the “subset” argument with the “weights” argument (the variable for weights is not the one I’m using to subset). When I try to do that, I receive an error message telling me vectors have different lengths. I’d appreciate it if you could help me with that.

    1. Thanks for checking in. Without further details, I cannot replicate your issue. I have just double checked:

      m1 = lm(y ~ x, data=d, weights=weight)
      m2 = lm(y ~ x, data=subset(d, z==1), weights=weight)

      and both regression models work as expected.

      1. It worked now. I was probably insisting on some small, embarassing mistake that was preventing R from running it correctly. Thank you once again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.