When we plot data with missing values, R does not connect them. This is probably the correct behaviour, but what if we really want to gloss over missing data points?
plot(variable.name[country=="UK"], type="b") gives me something like the following. I used
type="l" will give an empty plot – generally not very useful.
What if we simply leave out the missing values?
plot(na.omit(variable.name[country=="UK"]), type="b") kind of works, but we lose the correct spacing on the x-axis:
So what we can do is the following. In a first step we identify for which points we have data. Next we plot, but only these. In contrast to the above method, the spacing on the x-axis remains intact.
miss <- !is.na(variable.name[country=="UK"])
plot(which(miss), variable.name[country=="UK" & miss], col="red", type="b", lwd=2)
It is important to include an
xlim argument if we add multiple lines on the same plot. Typically I draw the axes separately, as this gives me more control over them, especially the labels on the x-axis.
miss <- !is.na(variable.name[country=="UK")
plot(which(miss), variable.name[country=="UK" & miss], col="red", type="b", lwd=2, axes=FALSE, xlim=c(1,16))
4 Replies to “Plotting Connected Lines with Missing Values”
I was getting some error’s that stated the ‘x’ and ‘y’ lengths differ. Depending on the persons data, this may help.
NA’s say to break the lines, so remove the NA’s.
miss <- !is.na(x) & !is.na(y)
Thanks for your feedback. Unfortunately you’re not giving enough information. If I understand correctly, you have missing data on both the X and Y. The above code plots Y against years, hence no missing data in the X. Again, if I understand correctly, isn’t the problem with the code fragment you posted that you use the logical AND rather than the OR to specify the variable ‘miss’? With the AND, you only skip cases where both X and Y are missing; but you probably also want to skip the cases where just X or just Y is missing (otherwise your line is not fully specified).
This really helped me, I think. Because I’m still in doubt. I’m working with several individuals that are tested for antibodies over time (Positive = 1, Negative = 0). However, over certain periods I have no samples. When plotting these values, I was only connect the 1’s and 0’s but not the 1’s to the 0’s. The code with “miss” solved this problem. Now, I want to make a linear model to calculate the intercept and rico but I get zero for both values. What can I do to solve this since I’ve been working with “miss” function?
(e.g. lm(J1790&J1790&J1810&J1840~Age) where J1790 etc. is an individual and Age is actually time in days. (I’m a bit confused so any help would be appreciated.)
Thanks for checking in. I’m not sure whether I understand your question, or the code snippet you included. You don’t need the code above (for the graphic) to run a linear regression model. However, the way you specified the model (with the ampersands “&”), your outcome variable is probably simply “TRUE” or “FALSE”. Perhaps you need to set up the data properly, like using c() or cbind().