Should We Use Stop Words?

When using automatic content analysis like Wordscores or Wordfish, stop words may be used. This is a contentious issue, with recommendations ranging from definitely use stop words to those who argue that stop words are a bad thing. What to do?

To me this sounded more like an empirical question than something beliefs could settle. Using professionally translated texts (i.e. party manifestos available in two languages), I examined how stop words affect predicted scores (i.e. party positions). Lowe & Benoit (2013) highlight that words considered as a priori uninformative can help predict party positions altogether. This can be used as an argument against using stop words. In my analysis, I applied just a few stop words, consisting almost entirely of grammatical terms like articles and conjunctions (function words). It turns out that removing these words can almost entirely remove the impact of language on predicted scores. Put differently, removing words that really carry no meaning can improve the predictions.

So should we use stop words? Yes, but we don’t need many stop words, and using stop words that clearly carry no substantive information seems to be a good idea.

Lowe, Will, and Kenneth Benoit. 2013. “Validating estimates of latent traits from textual data using human judgment as a benchmark.” Political Analysis 21 (3): 298–313. doi:10.1093/pan/mpt002.

Ruedin, D. 2013. “The Role of Language in the Automatic Coding of Political Texts.” Swiss Political Science Review 19 (4): 539–45. doi:doi:10.1111/spsr.12050.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s