An Ode to Low R2

It’s the time of the year when many of us do their share of grading. In my case, it’s quantitative projects, and every time I’m impressed how much the students learn. One thing that annoys me sometimes is to see how many of them (MA students) insist on interpreting R2 in absolute terms (rather than to compare similar models, for instance). That’s something they seem to learn in their BA course:

[in this simple model with three predictor variables], we only explain 3% of the variance; it’s a ‘bad’ model.

I paraphrased, of course. But I started to like low R2: They are a testament to the complexity of humans and their social world. They are a testament to the fact that we are not machines, we are in the world where quantitative analysis is about tendencies. Just imagine a world in which knowing your age and gender I could perfectly predict your political preferences… So there you have it: low R2 are great!

Easternization and Westernization as Simulacra

In sociology, the term simulacrum is used to refer to copies without original, crudely put. Here I want to stipulate that Westernization and Easternization would probably fit this definition, too. Westernization is not simply turning to a more Western lifestyle, but actually a trend towards (or aspiration of) the image of what the West is perceived to be (this image could be described as a geographical imagination in that the perception is selective in terms of what aspects of Western culture and lifestyle are picked up, and the meaning given to these aspects). The same is also the case for trends of Easternization, like open-plan living, Japanese-style beds, or relaxation techniques. By extension, we should not be afraid of Americanization — the allegation that places become more American: its some aspects of what we (stereotypically) associate with being American, and quite different processes may be at play as to why they become more commonplace. Good luck to anyone trying to model this!

Galtung’s ISD System

A while ago, I introduced Galtung’s (1967) AJUS system to classify distributions according to shape. The AJUS system can be used to reduce complexity.

Galtung also introduced the ISD system to describe changes over time in a similar manner. It can be applied to any situation where we have three points in time to characterize two periods during which changes may have happened. As with the AJUS system, the intuition is to ignore small and unimportant differences to focus on the bigger changes. While Galtung developed the ISD system for eye-balling, it remains relevant in the age of computers; the system just becomes more systematic.

The ISD system is available in my R package agrmt. Taking a vector representing the values at the three points in time, the isd() function gives the type and a description of the type as follows:

    Type 1: increase in both periods
    Type 2: increase in first period, flat in second period
    Type 3: increase in first period, decrease in second period
    Type 4: flat in first period, increase in second period
    Type 5: flat in both periods
    Type 6: flat in first period, decrease in second period
    Type 7: decrease in first period, increase in second period
    Type 8: decrease in first period, flat in second period
    Type 9: decrease in both periods

Reference: Galtung, J. 1969. Theory and Methods of Social Research. Oslo: Universitetsforlaget.

Galtung’s AJUS System

Galtung (1967) introduced the AJUS system as a way to classify distributions according to shape. This is a means to reduce complexity. The underlying idea is to classify distributions by ignoring small differences that are not important. The system was originally developed for eye-balling, but having it done by a computer makes the classification more systematic.

All distributions are classified as being one of AJUS, and I have added a new type “F” to complement the ones identified by Galtung.

  • A: unimodal distribution, peak in the middle
  • J: unimodal, peak at either end
  • U: bimodal, peak at both ends
  • S: bimodal or multi-modal, multiple peaks
  • F: flat, no peak; this type is new

The skew is given as -1 for a negative skew, 0 for absence of skew, or +1 for a positive skew. The skew is important for J-type distributions: it distinguishes monotonous increase from monotonous decrease.

I have implemented the AJUS system in my R package agrmt. By setting the tolerance, we can determine what size of differences we consider small enough to be ignored. The default tolerance is 0.1, equivalent to 10% if using 0 to 1. AJUS implemented in R sets a systematic threshold, something we do not do when eye-balling differences.

The tolerance parameter is not a trivial choice, but a test is included in the R package to directly test sensitivity to the tolerance parameter (ajusCheck).

Here are some examples (using the experimental ajusPlot function and tolerance = 10):
plot_
Differences smaller than the tolerance set (10) are ignored.

Reference: Galtung, J. 1969. Theory and Methods of Social Research. Oslo: Universitetsforlaget.