Mac *.txt.rtfd to *.txt

In a recent project, an assistant used TextEdit to supposedly save documents as pure (UTF-8) text files. We managed to fix the workflow, but I was left with a bunch of Zip files full of *.rtf from TextEdit. On a Windows or GNU/Linux machine, these files show up as what they are: folders that contain a rich text document (and potentially other stuff). I needed text documents.

After a bit of searching and tweaking, I got the following shell script to convert all the rich text documents in these folders/containers into text documents:

find . -name '*.rtf' -exec unoconv -f txt {} \;

There was a problem, though. The files all had a name containing important meta data. So I had the folder with the name of the file, and inside this folder the file but it was called TXT.txt (converted from TXT.rtf). I’m sure there’s a quick way in a shell script (if you know one, please share it in the comments), but I got stuck with the shell.

Enter LiveCode. Here’s a script that does just that. I guess I could have called the above shell script, but I already had this.

on mouseup
-- INPUT: select a folder with the *.txt.rtfd folders
answer folder "Input: Choose folder:"
put it into infoldername
set the defaultFolder to infoldername
put the folders into listoffolders
-- filter . and .. can cause problems
filter listoffolders without "."
filter listoffolders without ".."
-- OUTPUT: select a destination folder
answer folder "Output: Choose folder:"
put it into outfoldername
repeat with i = 1 to the number of lines of listoffolders
put line i of listoffolders into currentfolder
revCopyFile infoldername & slash & currentfolder &
slash & "TXT.txt", outfoldername & slash & textname
end repeat
end mouseup

Full LiveCode stack here on OSF (it’s nothing more than a button and a text field with a basic log).

Wordscores and JFreq – an update

An old post of mine on using JFreq and Wordscores in R still gets frequent hits. For some documents, the current version of JFreq doesn’t work as well as the old one (which you can find here [I’m just hosting this, all credit to Will Lowe]). For even longer documents, we have a Python script by Thiago Marzagão archived here (I have never tried this). And then there is quanteda, the new R package that also does Wordscores.

Having said this, a recent working paper by Bastiaan Bruinsma, Kostas Gemenis heavily criticize Wordscores. While their work does not discredit Wordscores as such (merely the quick and easy approach Wordscores advertises — which depending on your view is the essence of Wordscores), I prefer to read it as a call to validating Wordscores before they are applied. After all, in some situations they seems to ‘work’ pretty well, as Laura Morales and I show in our recent paper in Party Politics.