Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called “time series analysis”, which focuses on comparing values of a single time series or multiple dependent time series at different points in time.
R has extensive facilities for analyzing time series data: creation of a time series, seasonal decompostion, modeling with exponential and ARIMA models, and forecasting with the forecast package.
In order to analyse time series data has to be read into R and then the time series are plotted. We can read data into R using the scan() function, which assumes that our data for successive time points is in a simple text file with one column.
For example, the file http://robjhyndman.com/tsdldata/misc/kings.dat contains data on the age of death of successive kings of England, starting with William the Conqueror (original source: Hipel and Mcleod, 1994).
The first three lines contain some comment on the data, and we want to ignore this when we read the data into R. We can use this by using the “skip” parameter of the scan() function, which specifies how many lines at the top of the file to ignore. To read the file into R, ignoring the first three lines, we type:
kings <- scan(“http://robjhyndman.com/tsdldata/misc/kings.dat”,skip=3)
The age of death of 42 successive kings of England has been read into the variable ‘kings’.
The next step is to store the data in a time series object in R, so that you can use R’s many functions for analysing time series data. To store the data in a time series object, we use the ts() function in R. For example, to store the data in the variable ‘kings’ as a time series object in R, we type:
kingstimeseries <- ts(kings)
The result we get:
Start = 1
End = 42
Frequency = 1
 60 43 67 50 56 42 50 65 68 43 65 34 47 34 49 41 13 35 53 56 16 43 69 59 48 59 86 55 68 51
 33 49 67 77 81 67 71 81 68 70 77 56
We can plot the time series for the age of death of 42 kings by executing the following command:
We can see from the time plot that this time series could probably be described using an additive model, since the random fluctuations in the data are roughly constant in size over time.
If the time series data set have been collected at regular intervals to specify the number of times that data was collected per year by using the ‘frequency’ parameter in the ts() function. For monthly time series data frequency=12, while for quarterly time series data frequency=4.
You can also specify the first year that the data was collected, and the first interval in that year by using the ‘start’ parameter in the ts() function. For example, if the first data point corresponds to the second quarter of 1986, we would set start=c(1986,2).
I will demonstrate this on the example of data set of the number of births per month in New York city, from January 1946 to December 1959. This data is available in the file http://robjhyndman.com/tsdldata/data/nybirths.dat We can read the data into R, and store it as a time series object, by typing:
births <- scan(“http://robjhyndman.com/tsdldata/data/nybirths.dat”)
birthstimeseries <- ts(births, frequency=12, start=c(1946,1))
To plot the data we type the following command:
We can see from this time series that there seems to be seasonal variation in the number of births per month: there is a peak every summer, and a trough every winter. Again, it seems that this time series could probably be described using an additive model, as the seasonal fluctuations are roughly constant in size over time and do not seem to depend on the level of the time series, and the random fluctuations also seem to be roughly constant in size over time.
The file http://robjhyndman.com/tsdldata/data/fancy.dat contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, for January 1987-December 1993 (original data from Wheelwright and Hyndman, 1998). We can read the data into R by typing:
souvenir <- scan(“http://robjhyndman.com/tsdldata/data/fancy.dat”)
Read 84 items
souvenirtimeseries <- ts(souvenir, frequency=12, start=c(1987,1))
To plot the data we type the following command:
It appears that an additive model is not appropriate for describing this time series, since the size of the seasonal fluctuations and random fluctuations seem to increase with the level of the time series. We transform the time series in order to get a transformed time series that can be described using an additive model. For example, we can transform the time series by calculating the natural log of the original data:
logsouvenirtimeseries <- log(souvenirtimeseries)
Here we can see that the size of the seasonal fluctuations and random fluctuations in the log-transformed time series seem to be roughly constant over time, and do not depend on the level of the time series. Thus, the log-transformed time series can probably be described using an additive model.
Forecasts using Exponential Smoothing
Exponential smoothing can be used to make short-term forecasts for time series data.
The simple exponential smoothing method provides a way of estimating the level at the current time point. Smoothing is controlled by the parameter alpha; for the estimate of the level at the current time point. The value of alpha; lies between 0 and 1. Values of alpha that are close to 0 mean that little weight is placed on the most recent observations when making forecasts of future values.
For example, the file http://robjhyndman.com/tsdldata/hurst/precip1.dat contains total annual rainfall in inches for London, from 1813-1912 (original data from Hipel and McLeod, 1994). We can read the data into R and plot it by typing:
rain <- scan(“http://robjhyndman.com/tsdldata/hurst/precip1.dat”,skip=1)
Read 100 items
rainseries <- ts(rain,start=c(1813))
To make forecasts using simple exponential smoothing in R, we can fit a simple exponential smoothing predictive model using the “HoltWinters()” function in R. To use HoltWinters() for simple exponential smoothing, we need to set the parameters beta=FALSE and gamma=FALSE in the HoltWinters() function (the beta and gamma parameters are used for Holt’s exponential smoothing, or Holt-Winters exponential smoothing, as described below).
The HoltWinters() function returns a list variable, that contains several named elements.
For example, to use simple exponential smoothing to make forecasts for the time series of annual rainfall in London, we type:
rainseriesforecasts <- HoltWinters(rainseries, beta=FALSE, gamma=FALSE)
beta : FALSE
The output of HoltWinters() tells us that the estimated value of the alpha parameter is about 0.024. This is very close to zero, telling us that the forecasts are based on both recent and less recent observations (although somewhat more weight is placed on recent observations).
By default, HoltWinters() just makes forecasts for the same time period covered by our original time series. In this case, our original time series included rainfall for London from 1813-1912, so the forecasts are also for 1813-1912.
In the example above, we have stored the output of the HoltWinters() function in the list variable “rainseriesforecasts”. The forecasts made by HoltWinters() are stored in a named element of this list variable called “fitted”, so we can get their values by typing:
We can plot the original time series against the forecasts by typing:
The plot shows the original time series in black, and the forecasts as a red line. The time series of forecasts is much smoother than the time series of the original data here.