top of page

Time series analysis (ARIMA) on transportation data using Python

  • Writer: Karthik Jamalpur
    Karthik Jamalpur
  • Nov 28, 2021
  • 4 min read

Updated: Dec 5, 2021




Abstract:

Traffic congestion in the last decade spiked a lot because of increasing private car ownership. Traffic speed is also caused by the congestion growing in the world. Predicting traffic speed can be helpful in this scenario. This project leverages a time series-based modelling to study some factors that significantly affect traffic congestion. A floating car data is collected from the Rome ring road and is divided into 5 minutes intervals and the ARIMA model helps the decision-makers to better manage traffic congestion by capturing and predicting abnormal status. At first, highlighting the characteristics and structure of the dataset negatively impact the performance of the time series analysis. Used python to pre-process and prepare for the modeling phase. Using ARIMA(Auto-Regressive Integrated Moving Average model) to analyze and predict the traffic speed observations, measured in a designated part of Lazio region in Italy. By using these observations predicted speed has been updated for better guidance and route.



Let's breakdown the project structure,

  • Collecting data(FCD)

  • Understanding the data

  • Data pre-processing

  • Data modeling

  • Data validation

  • Data visualization

Traffic speed prediction

  • Traffic speed prediction is the task of forecasting real-time traffic information based on frequent and thorough floating car data, such as average traffic speed and traffic counts.

  • Short-term traffic prediction: It helps to predict traffic after 10 or 15 minutes etc.

  • Long-term traffic prediction: It predicts the state of traffic next day or next week etc.

  • Using the above two types can forecast travel time, traffic congestion and finding optimal routes, etc.

Floating car data

  • Vehicles enabled with GPS devices collect information about their location, their route, and travel speed throughout the road network [3]. This method of data collection is referred to as floating car data (FCD) and can be applied to derive historical speed data or even for real-time applications.

  • Floating car data based on GPS trajectories open a lot of possibilities in traffic modeling and analysis and provide valuable information to traffic planners and decision-makers.

Time series

  • A time series is a series of data points listed in order of time. Most commonly, a time series is taken at successive equally spaced points in time.

  • It can be useful to see how a given dataset, transportation variables change over time. Examples of time series are a prediction of traffic speed, stock (How mean speed, counts, standard deviation changes over time), Business sales.

Stationarity

Stationarity is when the time series of those properties do not depend on the time at which the series is observed







Floating car data (FCD) has plotted below fig1 shows the behavior. FCD is representing the observed values on a road section which are the number of counts(vehicles), average speed, standard deviation, and link number. The data set has been divided into 5 min intervals and some missing values are present in it because there are no counts found in that specific time interval. Missing values are handled by using python software on different platforms (PyCharm, Google colab, Jupyter notebook). Then, pre-processed data is plotted as fig.2. where on X-axis, Y-axis demonstrates time interval and speed respectively.



After that, calculated the hourly mean speed, daily mean speed then plotted them along with the actual pre-processed data.



Time series methodologies

  • Autoregressive model: It specifies that an output variable depends linearly on its own past values.

  • Moving average model: It specifies that an output variable depends linearly on the current and various past values.

  • Auto regressive moving average model: An Autoregressive Moving Average model (ARMA), is normally used to describe weakly stationary random time series in terms of two polynomials. The first of these polynomials is for autoregression, the second for the moving average.

  • Auto regressive integrated moving average model: Auto regressive integrated moving average models are a typical class of models for predicting a time series that can be made to be stationary by differencing, if necessary, maybe in conjunction with nonlinear transformations such as logging or deflating if necessary.


ARIMA Model: -

Arima assumes that the past value of the time series is enough to predict the future values of that variable. Together Autoregressive, moving average models with integration between them is called ARIMA where it has three parameters p, d, q which represents autoregression, Integration, and moving average respectively. To verify which model has the best fit Aducky fuller test is applied, from this model (2,1,0) is the best-fit model and predicted the data. Normalization and standardization techniques were applied on the pre-processed FCD data set to reduce the skewness and predict better outputs and then calculated the errors.

Normalization: - N = |(Xi -Xmin) / (Xmax - Xmin)|

Where, N = Normalized data

Xi = Individual actual pre-processed data (for i in FCD data set)

Xmin = minimum value in data set = 0 km/h

Xmax = maximum value in data set = 184 km/h

Standardization: - Z = |(Xi -μ )/ σ|

Where, Z = Standardized data

Xi = Individual actual pre-processed data (for i in FCD data set)

μ = mean of the data set

σ = standard deviation of the data set

Calculating the errors:


From the above errors’ calculation, pre-processed data has a slightly acceptable level of errors in which we can take MAPE >10% has good output and MAE also. Whereas Normalization has increased skewness instead of decreasing it with higher values and it would not a good idea to consider predicted results from this method. Standardization is effective with its three error outputs between 10 to 17, but it has a greater MAPE>10% which indicates that almost linear level of absolute error distribution which makes standardization optimal for this scenario.




 
 
 

Recent Posts

See All

Commenti


©2021 by KARTHIK JAMALPUR. Proudly created with Wix.com

bottom of page