This is something that nags me from time to time, particularly because it’s a documented feature I tend to forget. When using the split.screen()
function of R to combine plots (rather than par(mfrow)
) in Sweave or Knitr, the screen numbers are output in the main text, alongside the figure. Usually we want the figure only. All we need is adding results='hide'
. I find this slightly counter-intuitive, because I do want the result (in form of a figure). That said, I don’t have a suggestion what alternative would feel more intuitive. The same issue also happens when using library(rgdal)
for a local shapefile, and other similar situations.
Why Knitr Beats Sweave
No doubt Sweave is one of the pieces that makes R great. Sweave combines the benefits of R with those of LaTeX to enable reproducible research. Knitr is a more recent contribution by Yihui Xie, packing in the goodness of Sweave alongside cacheSweave, pgfSweave, RweaveHTML, HighlightWeaveLatex etc. It requires separate installation, interestingly also when using Rstudio.
As much as I like Sweave, I argue that often knitr is the better choice, despite there being no equivalent to R CMD Sweave --pdf
. First of all, knitr uses Rmarkdown, a set of intuitive human-readable code to do the formatting. While LaTeX is by no means as complicated as its reputation seems to suggest, Rmarkdown is actually easy. By human-readable I mean that anyone who has never even heard of Rmarkdown can understand what is happening to some extent.
Sweave is great for producing PDF, but that’s one of the biggest drawbacks of LaTeX in the social sciences: while the PDF may look good, they are not the format we need when collaborating with Word-only colleagues, and with rare exceptions when submitting a manuscript to journals. Knitr works very well with Pandoc, so creating a Word document or an ODF is just as easy as creating a PDF. The other day I had to submit a supplementary file as a *.doc file, even though it’ll end up as a PDF on Dataverse or so. With knitr this didn’t take long.
What’s the catch then? Rmarkdown comes with a restricted set of commands, and there is no way to create custom commands. This isn’t a problem, though. For instance, if you create a PDF with knitr, you can include standard LaTeX code, like \newpage
. More importantly, with a restricted set of commands, I find myself tinkering much less than what I do in LaTeX. In other words, with Rmarkdown and knitr, I do more of that purported benefit of LaTeX, namely concentrating on the contents rather than worrying about what it’ll end up looking. A more radical step would probably be writing in plain text and then finish it off in Word (or LibreOffice), because we seem to end up there anyway — at least at the submission stage.
There are two aspects where the restrictions of Rmarkdown are noticeable: references (roughly on par with Endnote, not with BibTeX), and complex tables. When it comes to complex tables, we should probably be thinking about graphs anyway. In this context, however, being human-readable highlights another advantage of knitr: if the document fails to compile, it’s much easier to debug (and here Sweave beats odfWeave by miles).
What neither approach resolves, however, is collaborating with the Word-only crowd who need the “track changes” feature.
Still Looking for the Perfect Workflow
I have written about Sweave and odfWeave, but the quest for the perfect workflow continues. For various reasons I prefer to do my statistical analyses in R, and manage my references in Zotero. In my area of research, the most common file format for papers is without doubt Microsoft Word; it’s the principal format for exchanging editable documents, and it’s the principal format most journals seem to want (actually insist on). Moreover, I seek a workflow that keeps my research reproducible.
Here are a few options I regularly use: (1) To me Sweave is an elegant and fast solution. It gives full control over the necessary parts of the output. The downsides are the lack of direct integration with Zotero, the difficulty to collaborate with colleagues who do not use LaTeX, and the need to convert papers for journals — at least once they are through peer review. Yes, I could use Lyx to use Zotero more easily than exporting the required references into Bibtext (at least Zotero allows dragging references into the .bib file), but this would end the elegance of working in the text file.
(2) In principle odfWeave fits the requirements rather well. It works very well with Zotero, and conversion to Word is usually quite simple (or to PDF). On the downside, odfWeave is really quite slow, and often chokes when (invisible) formatting gets in the way. I find that in-text code (\Sexpr{}
) is particularly prone to breaking. Frequent compiling would be a workaround to catch problems early, but unfortunately the lack of speed makes this more challenging than it first seems. Moreover, I have had documents compile, and then choking the next time I open them, despite not having touched the code in the document. Usually removing the formatting of the section in question helps, but there can be tough cases with spaces that aren’t quite what they seem to be, for example. I find hunting for reasons why a document does not compile really quite frustrating; at least Sweave is able to give more precise indications where the problem lies, if there is one.
(3) Something I do frequently when working with others is using R scripts and Word/LibreOffice side by side. While this disconnects the code and the output to some degree, keeping all the code in a single file maintains some degree of reproducibility. Comments in the code are important: why did I do this that way? Often I use a Sweave document to keep all the analysis in one place. A benefit over just commented code is that the compiled PDF has all figures and output, making it easier to pinpoint the code of interest.
(4) Another possibility I sometimes use, usually in combination with R scripts, is using comments in Word to attach the underlying code to the figures and numbers in the document. This works relatively well as long I am diligent in keeping the code up-to-date. A final version of the document can be created simply by choosing “remove all comments”. This works fine when working with others, but with many other comments can lead to crowded documents when track changes are used.
One downside of using Word or LibreOffice is that it’s just too easy to quickly tweak the column width of a table, or manually add an extra reference line to a graph (etc.). This becomes a problem when tables and graphs are updated, and all these small steps have to be done once again. The fact that the link to code is indirect also means that is generally difficult to be certain all the numbers have been updated, for example if a miscoded case is fixed in the data.
So for me, at the moment, there is no single workflow, but different ones, depending on whom I am working with and what the output is expected to be.
p<0.05 in Sweave
Here’s a very simple way to include p-levels in Sweave. Let’s assume you want to mention a correlation coefficient in your text, \Swexpr{}
will do just that.
\Sexpr{round(cor.test(x, y)$estimate,2)}
You can easily include the p-level, too.
\Sexpr{round(cor.test(x, y)$p.value,2)}
Except that’s not how it’s usually done. Normally we report whether the p-value is smaller than a certain threshold, and by convention only a few of them are considered.
Enter a very simple function (I’d include this in my first Sweave block where I load the data):
plevel <- function (x, strict=FALSE) { # levels of p-values, for Sweave # strict cuts at 0.05, otherwise cuts at 0.1 if (x>0.1 & strict==FALSE) p <- "p>0.1" # not significant if (x>0.1 & strict==TRUE) p <- "p>0.05" # not significant if (x<=0.1 & strict==FALSE) p <- "p<0.1" # significant if (x<=0.1 & strict==TRUE) p <- "p>0.05" # not significant if (x<=0.01) p <- "p<0.01" # significant if (x<=0.05) p <- "p<0.05" # significant if (x<=0.001) p <- "p<0.001" # significant return(p) }
Created by Pretty R at inside-R.org
This automatizes the procedure, and the cited thresholds will always be correct. I could make this function simpler by leaving out the strict argument, or obviously adjust the thresholds.
So, here’s how I use this in Sweave: some text ($r=\Sexpr{round(cor.test(x, y)$estimate,2)}$, $\Sexpr{plevel(cor.test(x, y)$p.value)}$) some more text
.
The dollar signs (math mode) mean that I get nice typography for the numbers and operators.
Why I Ditched Sweave for odfWeave
Well, I didn’t really – not completely. Sweave is an incredible tool for research. It facilitates replicable research to the benefit of researchers, too. Here is the main reason that cut it for me: getting back to a paper after a few weeks or months – think working on multiple papers, think hearing back from reviewers –, and not remembering all the details. For example, a figure or number looks odd, and by using Sweave I can immediately see where it comes from. Another reason is that I don’t want to avoid the following situation from the other end: I recently asked an author about a detail in a recently published paper. The response was “I can’t remember, I did the analysis some 2 year ago.” Using a single Sweave file, I also avoid confusion on the level of filenames (compare here or here).
So what is the problem with Sweave? There are two. First, on some papers I collaborate with others who don’t use R and Latex. Second, most journals in political sciences and sociology don’t accept Latex files. Enters odfWeave, doing almost everything Sweave does using LibreOffice rather than Latex. Creating Word documents for commenting and submission is easy. It also plays nicely with Zotero – which I find a bit easier to work in than Bibtex. (One annoyance: odfWeave hates relative paths.)
I said that I did not really ditch Sweave. For first drafts, I still like the accessibility, non-distraction, and compatibility of a plain text file. Usually I use heavily commented R-code, but Sweave is never far, especially as I can keep all analysis and plots in a single file.