This is another simple thing I keep on looking up: how to set the encoding when importing CSV data in R under Windows. I need this when my data file is in UTF-8 (pretty standard these days), but I’m using R under Windows; or when I have a Windows-encoded file when using R elsewhere. The default encoding in Windows is not UTF-8, and R uses the default encoding — well, by default. Typically this is not an issue unless my data file contains accented characters in strings, which can lead to garbled text when the wrong encoding is set/assumed.
The solution is quite simple: add encoding=""
to the read.csv()
command, like this:
x <- read.csv("datafile.csv", encoding="Windows-1252")
or like this:
x <- read.csv("datafile.csv", encoding="UTF-8")
Working across operating systems (Windows, Mac, GNU/Linux), I found it good practice to always specify encodings.