Following a comment I recently had on using moving averages, I wanted to share more widely how we can use splines to fill holes in (time series) data. Contrary to the moving-average based approach I posted earlier, using splines we can keep the observed values and just interpolate where there are missing values. What is more, in many cases splines are more appropriate, and actually surprisingly easy to use.
First we need some data with missing values (shamelessly nicked from Kirk, although modified to protect the original should this be necessary):
z <- c(-1.1484, -1.3842, -1.5985, -1.0626, -1.3413, -1.2341, -1.1269, NA, -0.7411, -0.7840, -0.6125, -0.8912, NA, NA, -1.1912, -1.7271, -1.0841, -0.9555, -0.9555, -0.6554, -0.4196, -0.5268, -0.3767, -0.2695, 0.2019, NA, NA, NA, 1.1880, 0.9736, 0.7807, 0.5878, 1.3594, 0.6306, 1.3809, NA, NA, NA, NA, NA, 1.5738, 1.6595, 1.1665, 0.9950, 1.7238, 1.1022, 1.1451, NA, 0.7807, NA, NA, NA, NA, NA, NA, NA, 0.8450, 0.5449, 0.2662, 0.8021, 0.4806, 0.1376, 0.5449, 0.2019, 0.4592)
First, we’re looking at the data.
plot(z, type="l", bty="n")
To identify the location of the missing values, we can use the following code (it also keeps things slightly simpler below):
miss <- which(is.na(z))
The core of the approach presented here is the splines function, part of R. It defines a function, here I called it a().
a <- splinefun(z)
We can now use this function to get individual points using splines to interpolate the curve. For instance, here I plot the values for each integer. Note that this plots on top of the observed values, but easily fills the holes. 65 is the length of the data vector.
points(1:65, a(1:65), col=2)
We can highlight the missing values (now interpolated) by plotting them in a different colour:
points(miss, a(miss), col=5)
And here’s how to identify a single point: the value for y at x=14.
points(14, a(14), col=3)
This figure has not much value except for demonstrating how splines can be used. The code here could be quite easily changed to replace missing values with interpolated ones.
To finish off, here’s a demonstration that the splines function is quite capable of dealing with fine-graded interpolations:
points(seq(1,65,.5), a(seq(1,65,.5)), col=3)