This week I had a data set with Swiss postcodes (Postleitzahl; PLZ), and wanted to use this information to create a variable on the respondent’s canton. It turned out slightly less trivial than I thought.
Finding a list of postcodes with the canton indicated wasn’t difficult. I quickly realized, though, that the relationship between Swiss postcodes and cantons isn’t straightforward: 1000:1197 are in the canton of Vaud, 1200:1258 in the canton of Geneva, then back to Vaud, etc. Once we get to 1290, it becomes obvious that it’s even more complicated than that: the same postcode is used for municipalities in two different cantons.
Since the relationship between postcode and canton is quite messy, I decided to simply use a table to lookup; I was looking for a quick solution, not necessarily an elegant one.
It turns out, there are just 12 postcodes not clearly assignable to a single canton. I looked more closely, and in most cases, the situation on the ground is a town or village in one canton, with a hamlet across the cantonal border. I assigned these to the larger settlement.
The R-code is available on github as a simple function that can be loaded using source()
, but essentially it’s a large table and a loop with a single line to match postcode and canton. All the ambiguous cases are clearly identified, making it easy to filter them out (e.g. a “NA” category).
I have updated the code on Gist to include postcodes not previously assigned, and to return “NA” when the input is not an existing postcode (i.e. not one in the database included).