How to add text labels to a scatter plot in R?

Adding text labels to a scatter plot in R is easy. The basic function is text(), and here’s a reproducible example how you can use it to create these plots:

Adding text to a scatter plot in R

For the example, I’m creating random data. Since the data are random, your plots will look different. In this fictitious example, I look at the relationship between a policy indicator and performance. It is conventional to put the outcome variable on the Y axis and the predictor on the X axis, but in this example there’s no relationship to reality anyway… The reason I chose min and max values for the random variables here is that I jotted down this code as an explanation for a replication. In this example, we have 25 observations, for 25 units I call “cantons”. The third line here creates a string of characters “A” to “Y”, these are the labels!

policy = runif(25, min=0.4, max=0.7)
perfor = runif(25, min=500, max=570)
canton = sapply(65:89, function(x) rawToChar(as.raw(x)))

For the scatter plot on the left, we use plot(). Then we add the trend line with abline() and lm(). To add the labels, we have text(), the first argument gives the X value of each point, the second argument the Y value (so R knows where to place the text) and the third argument is the corresponding label. The argument pos=1 is there to tell R to draw the label underneath the point; with pos=2 (etc.) we can change that position.

plot(policy ~ perfor, bty="n", ylab="Policy Indicator", xlab="Performance", main="Policy and Performance")
abline(lm(policy ~ perfor), col="red")
text(perfor, policy, canton, pos=1)

The scatter plot on the right is similar, but here we actually plot the labels instead of the dots. There are two differences in the code: First, we add type="n" to create the scatter plot without actually drawing any circles (an empty plot if you will). Second, when we add the text in the third line of the code, we do not have pos=1, because we want to place the labels exactly where the points are.

plot(policy ~ perfor, bty="n", type="n", ylab="Policy Indicator", xlab="Performance", main="Policy and Performance")
abline(lm(policy ~ perfor), col="red")
text(perfor, policy, canton)

Mac *.txt.rtfd to *.txt

In a recent project, an assistant used TextEdit to supposedly save documents as pure (UTF-8) text files. We managed to fix the workflow, but I was left with a bunch of Zip files full of *.rtf from TextEdit. On a Windows or GNU/Linux machine, these files show up as what they are: folders that contain a rich text document (and potentially other stuff). I needed text documents.

After a bit of searching and tweaking, I got the following shell script to convert all the rich text documents in these folders/containers into text documents:

find . -name '*.rtf' -exec unoconv -f txt {} \;

There was a problem, though. The files all had a name containing important meta data. So I had the folder with the name of the file, and inside this folder the file but it was called TXT.txt (converted from TXT.rtf). I’m sure there’s a quick way in a shell script (if you know one, please share it in the comments), but I got stuck with the shell.

Enter LiveCode. Here’s a script that does just that. I guess I could have called the above shell script, but I already had this.

on mouseup
-- INPUT: select a folder with the *.txt.rtfd folders
answer folder "Input: Choose folder:"
put it into infoldername
set the defaultFolder to infoldername
put the folders into listoffolders
-- filter . and .. can cause problems
filter listoffolders without "."
filter listoffolders without ".."
-- OUTPUT: select a destination folder
answer folder "Output: Choose folder:"
put it into outfoldername
repeat with i = 1 to the number of lines of listoffolders
put line i of listoffolders into currentfolder
revCopyFile infoldername & slash & currentfolder &
slash & "TXT.txt", outfoldername & slash & textname
end repeat
end mouseup

Full LiveCode stack here on OSF (it’s nothing more than a button and a text field with a basic log).

Wordscores and JFreq – an update

An old post of mine on using JFreq and Wordscores in R still gets frequent hits. For some documents, the current version of JFreq doesn’t work as well as the old one (which you can find here [I’m just hosting this, all credit to Will Lowe]). For even longer documents, we have a Python script by Thiago Marzagão archived here (I have never tried this). And then there is quanteda, the new R package that also does Wordscores.

Having said this, a recent working paper by Bastiaan Bruinsma, Kostas Gemenis heavily criticize Wordscores. While their work does not discredit Wordscores as such (merely the quick and easy approach Wordscores advertises — which depending on your view is the essence of Wordscores), I prefer to read it as a call to validating Wordscores before they are applied. After all, in some situations they seems to ‘work’ pretty well, as Laura Morales and I show in our recent paper in Party Politics.