Plotting Connected Lines with Missing Values

When we plot data with missing values, R does not connect them. This is probably the correct behaviour, but what if we really want to gloss over missing data points?

plot(variable.name[country=="UK"], type="b") gives me something like the following. I used type="b", since type="l" will give an empty plot – generally not very useful.

miss_plain

What if we simply leave out the missing values? plot(na.omit(variable.name[country=="UK"]), type="b") kind of works, but we lose the correct spacing on the x-axis:

miss_omit

So what we can do is the following. In a first step we identify for which points we have data. Next we plot, but only these. In contrast to the above method, the spacing on the x-axis remains intact.

miss <- !is.na(variable.name[country=="UK"])
plot(which(miss), variable.name[country=="UK" & miss], col="red", type="b", lwd=2)

miss_miss

It is important to include an xlim argument if we add multiple lines on the same plot. Typically I draw the axes separately, as this gives me more control over them, especially the labels on the x-axis.

miss <- !is.na(variable.name[country=="UK")
plot(which(miss), variable.name[country=="UK" & miss], col="red", type="b", lwd=2, axes=FALSE, xlim=c(1,16))
axis(2)
axis(1,at=c(1,6,11,16), labels=c("1995","2000","2005","2010"))

miss_typical

4 Replies to “Plotting Connected Lines with Missing Values”

  1. I was getting some error’s that stated the ‘x’ and ‘y’ lengths differ. Depending on the persons data, this may help.

    NA’s say to break the lines, so remove the NA’s.

    miss <- !is.na(x) & !is.na(y)
    plot(x[miss], y[miss])

    1. Thanks for your feedback. Unfortunately you’re not giving enough information. If I understand correctly, you have missing data on both the X and Y. The above code plots Y against years, hence no missing data in the X. Again, if I understand correctly, isn’t the problem with the code fragment you posted that you use the logical AND rather than the OR to specify the variable ‘miss’? With the AND, you only skip cases where both X and Y are missing; but you probably also want to skip the cases where just X or just Y is missing (otherwise your line is not fully specified).

  2. This really helped me, I think. Because I’m still in doubt. I’m working with several individuals that are tested for antibodies over time (Positive = 1, Negative = 0). However, over certain periods I have no samples. When plotting these values, I was only connect the 1’s and 0’s but not the 1’s to the 0’s. The code with “miss” solved this problem. Now, I want to make a linear model to calculate the intercept and rico but I get zero for both values. What can I do to solve this since I’ve been working with “miss” function?
    (e.g. lm(J1790&J1790&J1810&J1840~Age) where J1790 etc. is an individual and Age is actually time in days. (I’m a bit confused so any help would be appreciated.)

  3. Thanks for checking in. I’m not sure whether I understand your question, or the code snippet you included. You don’t need the code above (for the graphic) to run a linear regression model. However, the way you specified the model (with the ampersands “&”), your outcome variable is probably simply “TRUE” or “FALSE”. Perhaps you need to set up the data properly, like using c() or cbind().

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: