Dear journals…

…can we please universally start accepting tables and figures as part of the manuscript during review (i.e., not at the end)? It’s a pain to either scroll up and down, or open a second instance of the PDF just so that I can actually understand what I’m reading. Yes, I understand that there are historical reasons for this, and it facilitates production, but at the time of writing and reviewing, we have different concerns (plus: production gets paid, I don’t). Journals have managed to move from printed copies to digital copies of the manuscript, so there is no reason we cannot do the next step…

R: empty cells in weighted cross-tabs across multiple variables

I’m not even sure how to succinctly describe the problem, but here’s what worked for me. Well, I have two sets of variables and want to run a cross-tabulation. I also want to weigh the frequencies and then calculate the sum of them, and there are some empty (blank) cells to add to the mix. Three small problems in one; R to the rescue.

The two series of variables are as follows: attributes, each with 6 categories to indicate frequencies, alas grouped. So for attribute 1, variable Q5_1 indicates 0 occurrences, 1 to 5 occurrences, 6 to 10 etc. There are also sectors to identify subgroups, using a series of dummy variables to identify the sector (Q6_1, Q6_2, … Q6_10). So basically I want to run table(Q5_{1:6}, Q6_{1:10}), turning the categorical variables into approximative frequency counts.

First, I attach() my data; the get(paste(...)) code seems to like this by a mile.

Second, I create an empty matrix that I will subsequently fill with the (approximated) frequencies: tbl <- matrix(data=NA, nrow=6,ncol=10).

Third, I cycle through each pair of variables: 1:6 sectors (sectvar), and 1:10 attributes (attrvar).
for(attrvar in 1:6) {
for(sectvar in 1:13) {

Here I create a simple cross-tabulation for the current pair of variables. get(paste(...)) does all the work.
raw <- table(get(paste("Q5_",sectvar,sep="")), get(paste("Q6_",attrvar,sep="")))

Since I want to weigh the counts so as to approximate the actual frequencies from the categorical counts, I run into problems if there are empty cells in the previous step. table() simply leaves them out in the result. That’s usually fine, but problematic because of the weights. So I have to add the zeros back in. Here’s one way to do this: create an empty vector with as many zeros as the variable (attrvar) has: 6. (The package agrmt has a helper function for similar cases.)
raw2 <- c(0,0,0,0,0,0)

Next I replace all the zeros with the actual values from the variables raw if they exist. If they do not exist, we keep the zero.
for(i in 1:6){raw2[as.numeric(dimnames(raw)[[1]])[i]] <- raw[i]}

Now we have a complete frequency vector and I can apply my weights.
wei <- raw2 * c(0, 2.5, 5.5, 10.5, 15.5, 0)

and then sum up to approximate the actual count:
tbl[sectvar, attrvar] <- sum(wei)

print(tbl) now gives me the cross-tabulation with approximate counts.

Tables to Figures, well…

There are people more eloquent out there trying to convince researchers to use figures rather than tables in scientific publications. The only (real) reservation I could find so far is that figures only may be difficult for meta-analyses. Turns out there is one more…


I have recently received the following comment on a submitted paper:

“the graphical representation of the analysis does not offer enough (statistical) insights such as to evaluate the quality of the analysis done, nor to assess the validity of the conclusions drawn from it.”

To be fair to the reviewer, the other feedback I got was very constructive. I just wanted to use the opportunity to highlight that there is much more to do in terms of spreading the word about coefficient plots (above/to the right the kind of figure I used in the paper). The odd thing is that I even included tables in the appendix; in this day of online supplementary material there is no reason not to. Unfortunately, it seems that the reviewer overlooked them…