R: empty cells in weighted cross-tabs across multiple variables

I’m not even sure how to succinctly describe the problem, but here’s what worked for me. Well, I have two sets of variables and want to run a cross-tabulation. I also want to weigh the frequencies and then calculate the sum of them, and there are some empty (blank) cells to add to the mix. Three small problems in one; R to the rescue.

The two series of variables are as follows: attributes, each with 6 categories to indicate frequencies, alas grouped. So for attribute 1, variable Q5_1 indicates 0 occurrences, 1 to 5 occurrences, 6 to 10 etc. There are also sectors to identify subgroups, using a series of dummy variables to identify the sector (Q6_1, Q6_2, … Q6_10). So basically I want to run table(Q5_{1:6}, Q6_{1:10}), turning the categorical variables into approximative frequency counts.

First, I attach() my data; the get(paste(...)) code seems to like this by a mile.

Second, I create an empty matrix that I will subsequently fill with the (approximated) frequencies: tbl <- matrix(data=NA, nrow=6,ncol=10).

Third, I cycle through each pair of variables: 1:6 sectors (sectvar), and 1:10 attributes (attrvar).
for(attrvar in 1:6) {
for(sectvar in 1:13) {

Here I create a simple cross-tabulation for the current pair of variables. get(paste(...)) does all the work.
raw <- table(get(paste("Q5_",sectvar,sep="")), get(paste("Q6_",attrvar,sep="")))

Since I want to weigh the counts so as to approximate the actual frequencies from the categorical counts, I run into problems if there are empty cells in the previous step. table() simply leaves them out in the result. That’s usually fine, but problematic because of the weights. So I have to add the zeros back in. Here’s one way to do this: create an empty vector with as many zeros as the variable (attrvar) has: 6. (The package agrmt has a helper function for similar cases.)
raw2 <- c(0,0,0,0,0,0)

Next I replace all the zeros with the actual values from the variables raw if they exist. If they do not exist, we keep the zero.
for(i in 1:6){raw2[as.numeric(dimnames(raw)[[1]])[i]] <- raw[i]}

Now we have a complete frequency vector and I can apply my weights.
wei <- raw2 * c(0, 2.5, 5.5, 10.5, 15.5, 0)

and then sum up to approximate the actual count:
tbl[sectvar, attrvar] <- sum(wei)

print(tbl) now gives me the cross-tabulation with approximate counts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s