This is something that nags me from time to time, particularly because it’s a documented feature I tend to forget. When using the split.screen()
function of R to combine plots (rather than par(mfrow)
) in Sweave or Knitr, the screen numbers are output in the main text, alongside the figure. Usually we want the figure only. All we need is adding results='hide'
. I find this slightly counter-intuitive, because I do want the result (in form of a figure). That said, I don’t have a suggestion what alternative would feel more intuitive. The same issue also happens when using library(rgdal)
for a local shapefile, and other similar situations.
A Modified Shapefile for Plotting Swiss Cantons
The GADM database of Global Administrative Areas is the place to go for shapefiles of administrative boundaries. The data are freely available for academic use and other non-commercial use. Unfortunately, the boundaries of the Swiss cantons are a bit buggy. Version 2 has fixed the presence of a 27th canton — part of Lake Geneva, but it includes the entire lake of any lake shared with a neighbouring country as Swiss territory.
The licence of the GADM files is never made explicit, and while it says “Please contribute in whatever way you can by sending us a message to point out errors, or even better, to send an improved file for a country of your interest,” there is no way to send improved files on their website. I never got replies to my messages, so consider this post as my sending an improved file.
The files are on Github. I have modified cantonal boundaries so that they match the national boundaries: only the Swiss part of lakes is included as Swiss territory. I’m not sure the files other than the .shp, .shx, .dbf, .prj are actually needed.
Here’s how to draw a map in R. First we load two libraries:
library(sp)
library(rgdal)
Then we import the shapefile:
x <- readOGR(dsn = ".", layer = "ch-cantons")
The dsn
argument takes the location of the shapefile (dot for current directory); the layer
argument takes the name of the shapefile without extensions.
And now we draw the map:
data <- 1:26 # values to plot
x$data <- data # add to SpatialPolygonsDataFrame
col <- rev(heat.colors(26)) # define colours
spplot(x, "data", col.regions=col, main="", colorkey=TRUE)
The cantons are as follows: c("AG","AR", "AI","BL", "BS", "BE", "FR", "GE", "GL", "GR", "JU", "LU", "NE", "NW", "OW", "SG", "SH", "SZ", "SO", "TG", "TI", "UR", "VS", "VD", "ZH", "ZG")
.
Maps in R
It isn’t difficult to draw maps in R, and in fact there are several options available. Here’s how to draw maps using the package sp.
I use a map of Swiss cantons to lament on the shapefile available from fantastic GADM. Let’s have a look at the population for each canton first, however, available here. I copied the table into a spreadsheet to select just the cantons and replace the canton names with their abbreviations. I often find this kind of data handling easier outside of R.
p <- read.table(file="clipboard")
rownames(p) <- p$V1
p$V1 <- NULL
p <- data.frame(t(p))
Here I import the data via the clipboard; use the names of the cantons as row names, and remove the first column that held the names. I also transpose the data and convert it to a data frame. Usually this kind of preparation isn’t necessary, as I have the data already in R (from whatever analysis I’m doing).
library(sp)
con <- url("http://biogeo.ucdavis.edu/data/gadm2/R/CHE_adm1.RData")
print(load(con))
close(con)
Here I load the package, and get the shapefile from GADM. You can find the URL on the GADM website. This code is essential.
target <- c("AG","AR", "AI","BL", "BS", "BE", "FR", "GE", "GL", "GR", "JU", "LU", "NE", "NW", "OW", "SG", "SH", "SZ", "SO", "TG", "TI", "UR", "VS", "VD", "ZG", "ZH")
This is the order of the cantons in the shapefile. It seems to be in alphabetical order when the names of the cantons are spelled out. This is different from the idiosyncratic way the data are reported on the website used. With such a small number of units (26 cantons), I could enter the data manually, but that’s prone to errors.
data <- NA # create target
data <- sapply(target, function(x) {data[1] <- as.numeric(p[x])})
Here I use the fantastic sapply
function to allocate the values in p to their right place.
gadm$data <- data
col <- rev(heat.colors(26))
spplot(gadm, "data", col.regions=col, main="Population (in thousands)")
This is the final code, and we’re back to essential code in the final line.
If you want to follow this example, but struggle to get the data in, try this instead:
p <- structure(list(ZH = 1425.5, BE = 1001.3, LU = 390.3, UR = 35.9,
SZ = 151.4, OW = 36.5, NW = 41.9, GL = 39.6, ZG = 118.1,
FR = 297.6, SO = 261.4, BS = 189.3, BL = 278.7, SH = 78.8,
AR = 53.7, AI = 15.8, SG = 491.7, GR = 195, AG = 636.4, TG = 260.3,
TI = 346.5, VD = 749.4, VS = 327, NE = 176.4, GE = 469.4,
JU = 71.7), .Names = c("ZH", "BE", "LU", "UR", "SZ", "OW",
"NW", "GL", "ZG", "FR", "SO", "BS", "BL", "SH", "AR", "AI", "SG",
"GR", "AG", "TG", "TI", "VD", "VS", "NE", "GE", "JU"), row.names = "V2", class = "data.frame")
And I nearly forgot the lamenting: while version 1 of the GADM shapefile included 27 areas, one of which was the Swiss part of Lake Geneva, version 2 of the GADM shapefile includes the entire Lake Constance as part of the canton of Thurgau — including the parts that are German. Let’s hope their quicker to fix this time around (I reported it).