One way moving averages can be used, is to fill holes in time series. Consider the standard situation at the top of the illustration: the moving average simply averages within a given window. The value for the dot in red is replaced by the average of all values in the red box; the value for the dot in green is replaced by the average of all values in the green box; and so on.
With holes in the data (situation at the bottom), we can use all the data available in a given window. This means that sometimes we use many numbers, and sometimes we have just a few. In this example, the average calculated for the red and green position is the same; with the green line indicating the here absent green dot.
For a numerical example, consider the following time series with plenty of holes as data:
data <- c(NA,.223,NA,NA,.359,NA,NA,.302,NA,NA,NA,NA,.260,NA,.391)
Using the following function, I simply average over the holes, using as much data as available in a given time window.
namav <- function(x,k=3){
x <- c(rep(NA, k),x,rep(NA,k)) # add NA on both sides
n <- length(x)
return(sapply((k+1):(n-k), function(i) sum(x[(i-k):(i+k)],na.rm=TRUE)/(2*k+1-sum(is.na(x[(i-k):(i+k)])))))
}
The value for k determines the width of the time window. The following graph illustrates how different values for k play out in these data. With k=0, only the actual data are shown. With k=1, the data are also applied to the point before and after; with k=2 we get a contiguous time series: Each point is the average of all available data points within a (moving) window that includes two values before and two values after, and the point itself, of course. The higher the value of k, the closer we get to the mean across all time points (i.e. a flat line).
2 Replies to “Moving average to fill holes (interpolation)”