Sometimes we need to run a regression analysis on a subset or sub-sample. That’s quite simple to do in R. All we need is the `subset`

command. Let’s look at a linear regression:

`lm(y ~ x + z, data=myData)`

Rather than run the regression on all of the data, let’s do it for only women, or only people with a certain characteristic:

`lm(y ~ x + z, data=subset(myData, sex=="female"))`

`lm(y ~ x + z, data=subset(myData, age > 30))`

The `subset()`

command identifies the data set, and a condition how to identify the subset.

### Like this:

Like Loading...

*Related*

thanks, that helped

Thanks for checking in. I’m glad you found this useful!

Hi,

Is it possible to specify both “sex== female” and “age>30” at the same time? Is there a limit to how many specifications you can add?

For example, if I wanted to include data from only one year (from the year column) and only females (from the gender column) and weight >50 (from the weight column) and regress weight on females born in the specified year.

If this is possible, how would one write it?

Yes, you can combine as many criteria as you want. What you need is the & operator for AND, and perhaps the | operator for OR. See https://www.statmethods.net/management/operators.html for a quick overview. So you could run:

lm(y ~ x + z, data=subset(myData, sex==”female” & age>30))

Thank you, that’s very helpful!