I’m not even sure how to succinctly describe the problem, but here’s what worked for me. Well, I have two sets of variables and want to run a cross-tabulation. I also want to weigh the frequencies and then calculate the sum of them, and there are some empty (blank) cells to add to the mix. Three small problems in one; R to the rescue.
The two series of variables are as follows: attributes, each with 6 categories to indicate frequencies, alas grouped. So for attribute 1, variable Q5_1 indicates 0 occurrences, 1 to 5 occurrences, 6 to 10 etc. There are also sectors to identify subgroups, using a series of dummy variables to identify the sector (Q6_1, Q6_2, … Q6_10). So basically I want to run table(Q5_{1:6}, Q6_{1:10}), turning the categorical variables into approximative frequency counts.
First, I attach()
my data; the get(paste(...))
code seems to like this by a mile.
Second, I create an empty matrix that I will subsequently fill with the (approximated) frequencies: tbl <- matrix(data=NA, nrow=6,ncol=10)
.
Third, I cycle through each pair of variables: 1:6 sectors (sectvar
), and 1:10 attributes (attrvar
).
for(attrvar in 1:6) {
for(sectvar in 1:13) {
Here I create a simple cross-tabulation for the current pair of variables. get(paste(...))
does all the work.
raw <- table(get(paste("Q5_",sectvar,sep="")), get(paste("Q6_",attrvar,sep="")))
Since I want to weigh the counts so as to approximate the actual frequencies from the categorical counts, I run into problems if there are empty cells in the previous step. table()
simply leaves them out in the result. That’s usually fine, but problematic because of the weights. So I have to add the zeros back in. Here’s one way to do this: create an empty vector with as many zeros as the variable (attrvar) has: 6. (The package agrmt has a helper function for similar cases.)
raw2 <- c(0,0,0,0,0,0)
Next I replace all the zeros with the actual values from the variables raw
if they exist. If they do not exist, we keep the zero.
for(i in 1:6){raw2[as.numeric(dimnames(raw)[[1]])[i]] <- raw[i]}
Now we have a complete frequency vector and I can apply my weights.
wei <- raw2 * c(0, 2.5, 5.5, 10.5, 15.5, 0)
and then sum up to approximate the actual count:
tbl[sectvar, attrvar] <- sum(wei)
}
}
print(tbl)
now gives me the cross-tabulation with approximate counts.