R can import SPSS files quite easily, using the package foreign and the read.spss command. It usually works quite well out of the box, so well that I usually choose the SPSS file when downloading secondary data (hint: look at the argument use.value.labels depending on how you want your data).
Sometimes R isn’t so happy, throwing warnings like “Unrecognized record type 7, subtype 18 encountered in system file”. Generally warnings in R are there for a reason. Usually these seem to be variable and data attributes in SPSS, but to be sure, simply convert the SPSS file into SPSS Portable (*.por rather than *.sav). Don’t have SPSS? Enter PSPP , a free (open source) program that can help you out! (for Windows, check directly on this site).
PSPP can open SPSS files faster than SPSS, and under
File > Save as...
there’s the option to save as a Portable file (rather than the default System File) at the bottom left of the dialog. If you import this (portable) SPSS file to R, there should be no errors or warnings.
Hello, you proposed method avoids the annoying warnings, but can cause serious problems to lose information in the data file.
The portable file (* .por) is an old format used for compatibility in different versions of SPSS and different OS. Its current use is not recommended by several limitations, including: shorten variable names, change the file encoding (not support unicode), recoded variable information, etc.
Also, if you use SPSS to change the file format, there is a probability that data loss can not be reversed (depending on the version of SPSS) as SPSS overwrites the information in the data file. PSPP does a better job keeping the information, but to follow the “format specifications” also turns to lose some information.
The recommendation is to use the * .sav file format and ignore the warnings, as most are safe.
If so that the information variables are not important, it is better to convert the *.sav data file to *.csv data file.
notes:
1. Specifications for SPSS data files are not available, all the work is done through reverse engineering. The warnings are due to this, since each version of SPSS brings changes combined with the different file encodings (if the file can be read is not fatal)
2. If the data file can be opened in PSPP it is very likely that be opened in R because it uses the same libraries for reading data.
3. The best option for download PSPP for windows is here: http://sourceforge.net/projects/pspp4windows/
4. The other alternative read SPSS files is “pspp file conversion service” (requires internet connection): http://pspp.benpfaff.org/
Cheers
Thanks for these detailed comments. I’m aware of the limitations of the *.por format, and that R uses PSPP code to open SPSS files (this is properly credited). The point, however, to an inexperienced end-user, the error message in R is not that helpful. Would it be safe to ignore? (In my experience it almost always is.) When converting in PSPP (or SPSS, but funnily enough SPSS sometimes files to convert where PSPP succeeds), the changes seem more visible — if we’re looking at a small dataset.
He’s right, but it all depends on the version of PSPP/SPSS you use.
I think an inexperienced end user will call attention to the warnings, but better instruct them these warnings are safe (unless the file can not be opened). I think the same inexperienced end user does not know the *.por file format, and less know its limitations.
An inexperienced R end user, always will resort to using a GUI, so importing external files will be very transparent. Whoever takes the command line, I think it will have more knowledge or more tools for identifying/determining a problem.
Usually warnings are like “Unrecognized record type subtype encountered in system file”.
They warn differences in the recognition of the structure of the file type (due to lack of official specification for reading SPSS files), but the data is always preserved. See: https://www.gnu.org/software/pspp/pspp-dev/html_node/System-File-Format.html#System-File-Format
Cheers
As I keep getting hits for this post, library(haven) gives me better results for SPSS files (default method to import SPSS files in Rstudio).