Sometimes we need to run a regression analysis on a subset or sub-sample. That’s quite simple to do in R. All we need is the `subset`

command. Let’s look at a linear regression:

`lm(y ~ x + z, data=myData)`

Rather than run the regression on all of the data, let’s do it for only women, or only people with a certain characteristic:

`lm(y ~ x + z, data=subset(myData, sex=="female"))`

`lm(y ~ x + z, data=subset(myData, age > 30))`

The `subset()`

command identifies the data set, and a condition how to identify the subset.

### Like this:

Like Loading...

*Related*

thanks, that helped

Thanks for checking in. I’m glad you found this useful!

Hi,

Is it possible to specify both “sex== female” and “age>30” at the same time? Is there a limit to how many specifications you can add?

For example, if I wanted to include data from only one year (from the year column) and only females (from the gender column) and weight >50 (from the weight column) and regress weight on females born in the specified year.

If this is possible, how would one write it?

Yes, you can combine as many criteria as you want. What you need is the & operator for AND, and perhaps the | operator for OR. See https://www.statmethods.net/management/operators.html for a quick overview. So you could run:

lm(y ~ x + z, data=subset(myData, sex==”female” & age>30))

Thank you, that’s very helpful!

Thanks for your article.

I do have a large data file with 67 countries. How can I run multiple regressions on different countries? Or how can I run multiple regressions on different periods? There are some long ways to follow in [R] and get the results I am looking for, but I would like to learn the shortcut. Thanks

Thanks for checking in. If I understand you right, you want to run (e.g.) 67 regression models from your dataset? You’ll need a loop and the assign function as described here: https://druedin.com/2015/11/28/same-explanatory-variables-multiple-dependent-variables-in-r/