To the best of my knowledge, R does not have a built-in function to calculate moving averages. Using the filter function, however, we can write a short function for moving averages:
mav <- function(x,n=5){stats::filter(x,rep(1/n,n), sides=2)}
We can then use the function on any data: mav(data), or mav(data,11) if we want to specify a different number of data points than the default 5; plotting works as expected: plot(mav(data)).
In addition to the number of data points over which to average, we can also change the sides argument of the filter functions: sides=2 uses both sides, sides=1 uses past values only.
If you are interested in moving averages in R, check out the functions in the zoo package: rollmean, rollmedian, rollmax.
Thanks for sharing, just what I needed. I was sure it couldn’t be so hard, but didn’t quite figure it out…
I don’t get it. Why does the line become steeper the higher number I input. By only typing mav(a) it gives a downwards sloping line. and a is the name of my data.
Hen Kka, without telling us more about your data (“a”) and how you plot the values, it is difficult to say anything.
The data’s coefficient is negative (downward sloping.
7.2000, 7.8500,7.2500,7.450,7.520,7.1700… in total 31 figures.
The slope is steeper when I write plot(mav(a,30)) than when I write plot(mav(a,10)) . Why?
Hen Kka, I’m afraid I still don’t get it; your example isn’t clear enough. Perhaps if you shared all 31 data points, I could understand. The larger the number you put, the larger the number over which you average.
I am very new to R and have a dataset of 74 data points. I am still too stupid to understand your line given. What is x, what the two n after the rep? Furthermore I don’t understand the sides? Sides of what? Basically I don’t understand anything but I want my data to be less noisy and thus apply the moving average. Sorry to bother you!
Dear AK,
You’re not telling us what format your data are in, but the function works with most formats. The way I understand your question suggests that you were able to run the function on your code, but struggle with the NA at both ends.
Let’s take an example:
data <- c(1.3, 1.5, 1.9, 1.6, 1.7, 0.9, 1.8, 1.9, 2.4, 2.3, 1.8)
So the default is n=5 data points considered, and it looks at both sides (forward/backward; left/right, whatever you want to call it). The moving average just calculates the mean (=average) for each of the data points.
For the first data point (1.3), the moving average is not defined. This is why you get an NA. It is not defined because there are no values to the left of 1.3, so we cannot say what the average is. The same happens with the second data point.
For the third data point (1.9), we can calculate the average. We simply take the first to fifth data point (1.3, 1.5, 1.9, 1.6, 1.7) and calculate the mean (=1.6). For the fourth data point, we take (1.5, 1.9, 1.6, 1.7, 0.9), and calculate the mean (=1.52). And so on, until we reach the end of the data where there are two NA, since there are no values to the right of the data.
You can look at this graphically, too:
plot(data, type="l", ylim=c(0,3))
These are my data. type=”l” draws a line, and I defined ylim to choose the y-axis to go from 0 to 3.
lines(mav(data), type="l", ylim=c(0,3), col="red")
Here we add the moving average mav(data). The command lines adds a line on top; col=”red” draws it in red.
Hello, I have a moving average question that has been troubling me: I need to calculate an 11-year inclusive moving average for some climate data. A given result should be the average of the previous 5 data points, the present data point, and the next 5 data points.
With your function, would I want to n to equal 11 or 5? (I’m pretty sure I want to leave sides=2.)
Additionally, I’m familiar with the other moving average functions available with other packages, but I haven’t been able to find any clarification on whether the interval length reflects the total length or just one “side” of the window. Any thoughts?
Thanks,
Jim Reilly
Jim Reilly, your comment highlights that the post here isn’t clear enough. You could find the answer to your question by ploughing through the help for the filter command: “If ‘sides = 1’ the filter coefficients are for past values only; if ‘sides = 2’ they are centred around lag 0. In this case the length of the filter should be odd, but if it is even, more of the filter is forward in time than backward.”
So, it is centred. In your case, you want a span of 11 years, so you need to specify 11 years.
You can also verify this easily using the following code:
data <- c(0,1,2,3,99)
plot(data, type="l", col="blue")
text(1:5, data, data, pos=2, col="blue")
lines(mav(data,3), type="b", col="red")
text(1:5, mav(data,3), round(mav(data,3),2), pos=3, col="red")
So we have a simple (time) series with values 0, 1, …, 99. I first plot the data (as a blue line), and label the values (to the left of each point). Next I plot the moving average with 3 as the argument (as a red line with circles to highlight the data points). Again I label the values (at the top of each point).
On the left, the moving average at position 1 is not defined, since there are not three data points in the span defined. The moving average at position 2 is defined: it is 1, namely (0+1+2)/3. If the function would calculate the moving average using 3 points on either side, there wouldn’t be enough data points in the span here either. The moving average at position 3 is: (1+2+3)/3 = 2; at position 4 it is 34.67 = (2+3+99)/3.
Thanks for the response and explanation. Much appreciated.