LaTeX to Word

Ever needed to convert a LaTeX document to Word, like to submit it to a social sciences journal insisting on MS Word format? There are several options out there, including using Adobe Reader to save the PDF as a Word document. In my experience, the best results can be obtained when using MS Word to open the PDF document (yes, MS word can open PDF documents!). Obviously you’ll have to check everything carefully, but recent version of Word even seem to handle most equations right.

Alternatively, write in Pandoc Markdown to start with (or Sciflow if you need online collaborative writing), and you can create beautiful PDF as well as Word document, whatever you need.

Use MS Word to Convert PDF Files

Recently we had to convert a PDF file to MS Word so that we could benefit from the Track Changes feature in MS Word. The proofreader did not want to use the commenting tools in Adobe/Acrobat because he found them inefficient to propose changes in the text. (Yes, he could make direct changes, but it takes much more time.) We had a LaTeX source file and faced the common challenge of turning this into a Word file. I remembered that Adobe/Acrobat can export PDF to Word files, but as I have experienced many times, the output did not satisfy me at all. I also tried pandoc, but it turned out that we used bits of LaTeX pandoc cannot (yet) handle. When checking the output, I discovered that Word can open PDF files. We quite liked the output and had to tidy up only a few bits and pieces to have an acceptable Word file.

We could have avoided this challenge by using markdown and pandoc to start with… my usual approach these days.

Still Looking for the Perfect Workflow

I have written about Sweave and odfWeave, but the quest for the perfect workflow continues. For various reasons I prefer to do my statistical analyses in R, and manage my references in Zotero. In my area of research, the most common file format for papers is without doubt Microsoft Word; it’s the principal format for exchanging editable documents, and it’s the principal format most journals seem to want (actually insist on). Moreover, I seek a workflow that keeps my research reproducible.

Here are a few options I regularly use: (1) To me Sweave is an elegant and fast solution. It gives full control over the necessary parts of the output. The downsides are the lack of direct integration with Zotero, the difficulty to collaborate with colleagues who do not use LaTeX, and the need to convert papers for journals — at least once they are through peer review. Yes, I could use Lyx to use Zotero more easily than exporting the required references into Bibtext (at least Zotero allows dragging references into the .bib file), but this would end the elegance of working in the text file.

(2) In principle odfWeave fits the requirements rather well. It works very well with Zotero, and conversion to Word is usually quite simple (or to PDF). On the downside, odfWeave is really quite slow, and often chokes when (invisible) formatting gets in the way. I find that in-text code (\Sexpr{}) is particularly prone to breaking. Frequent compiling would be a workaround to catch problems early, but unfortunately the lack of speed makes this more challenging than it first seems. Moreover, I have had documents compile, and then choking the next time I open them, despite not having touched the code in the document. Usually removing the formatting of the section in question helps, but there can be tough cases with spaces that aren’t quite what they seem to be, for example. I find hunting for reasons why a document does not compile really quite frustrating; at least Sweave is able to give more precise indications where the problem lies, if there is one.

(3) Something I do frequently when working with others is using R scripts and Word/LibreOffice side by side. While this disconnects the code and the output to some degree, keeping all the code in a single file maintains some degree of reproducibility. Comments in the code are important: why did I do this that way? Often I use a Sweave document to keep all the analysis in one place. A benefit over just commented code is that the compiled PDF has all figures and output, making it easier to pinpoint the code of interest.

(4) Another possibility I sometimes use, usually in combination with R scripts, is using comments in Word to attach the underlying code to the figures and numbers in the document. This works relatively well as long I am diligent in keeping the code up-to-date. A final version of the document can be created simply by choosing “remove all comments”. This works fine when working with others, but with many other comments can lead to crowded documents when track changes are used.

One downside of using Word or LibreOffice is that it’s just too easy to quickly tweak the column width of a table, or manually add an extra reference line to a graph (etc.). This becomes a problem when tables and graphs are updated, and all these small steps have to be done once again. The fact that the link to code is indirect also means that is generally difficult to be certain all the numbers have been updated, for example if a miscoded case is fixed in the data.

So for me, at the moment, there is no single workflow, but different ones, depending on whom I am working with and what the output is expected to be.

Creating Sparklines in MS Word

A few years back I’ve prepared a step-by-step guide to creating sparklines in SPSS and MS Word. I’ve come up with the idea long before learning about what is today widely known as sparklines from Edward Tufte (writing a physics lab report in high school). After a recent comment by a colleague (“you can do that in Word?”), I’ve decided to put this tutorial online.

The general principle applies to any software package to create the original graphs, including MS Excel. Continue reading “Creating Sparklines in MS Word”