Speed prediction in large and dynamic traffic sensor networks

https://doi.org/10.1016/j.is.2019.101444Get rights and content

Highlights

  • Dynamic traffic sensor networks bring challenges in the context of urban mobility

  • We evaluate three approaches for speed prediction over large/dynamic sensor networks

  • The global and cluster-based approaches provide accurate and robust prediction models

  • The global approach solves the cold start problem

  • We provide a large dataset and assess the effectiveness of the three approaches

Abstract

Smart cities are nowadays equipped with pervasive networks of sensors that monitor traffic in real-time and record huge volumes of traffic data. These datasets constitute a rich source of information that can be used to extract knowledge useful for municipalities and citizens. In this paper we are interested in exploiting such data to estimate future speed in traffic sensor networks, as accurate predictions have the potential to enhance decision making capabilities of traffic management systems. Building effective speed prediction models in large cities poses important challenges that stem from the complexity of traffic patterns, the number of traffic sensors typically deployed, and the evolving nature of sensor networks. Indeed, sensors are frequently added to monitor new road segments or replaced/removed due to different reasons (e.g., maintenance). Exploiting a large number of sensors for effective speed prediction thus requires smart solutions to collect vast volumes of data and train effective prediction models. Furthermore, the dynamic nature of real-world sensor networks calls for solutions that are resilient not only to changes in traffic behavior, but also to changes in the network structure, where the cold start problem represents an important challenge. We study three different approaches in the context of large and dynamic sensor networks: local, global, and cluster-based. The local approach builds a specific prediction model for each sensor of the network. Conversely, the global approach builds a single prediction model for the whole sensor network. Finally, the cluster-based approach groups sensors into homogeneous clusters and generates a model for each cluster. We provide a large dataset, generated from 1.3 billion records collected by up to 272 sensors deployed in Fortaleza, Brazil, and use it to experimentally assess the effectiveness and resilience of prediction models built according to the three aforementioned approaches. The results show that the global and cluster-based approaches provide very accurate prediction models that prove to be robust to changes in traffic behavior and in the structure of sensor networks.

Introduction

Highly populated cities increasingly face mobility challenges caused by transport and traffic. The huge volume of data collected by real-time traffic monitoring sensors provides new opportunities to develop models and algorithms that enhance transportation services towards intelligent transportation systems, in particular those dealing with traffic predictions. Vehicle speeds on road networks are determined by complex traffic processes governed by stochastic and non-linear interactions between individual drivers [1], hence predicting the speed of vehicles is as complex as predicting the underlying traffic processes. Short-term traffic prediction techniques have been investigated and exploited since some time [2]. However, the emergence of smart cities, where urban areas are covered by massive amounts of sensors, combined with the development of transportation technologies, requires traffic prediction techniques that are fast, scalable, and suitable for complex and heterogeneous sensors networks like those deployed in smart cities. Many different traffic sensor technologies are currently used to monitor road networks, such as those based on inductive-loop detectors, magnetometers, video image processors, microwave radar sensors, laser radar sensors, passive infrared sensors, ultrasonic sensors, passive acoustic sensors, and devices exploiting combinations of the aforementioned technologies [3].

In this work we focus on sensors capable of capturing the speed of vehicles traveling over large and dynamic road networks, where sensors can be added or removed from the network for various reasons, and address the problem of training accurate prediction models that are capable of maintaining their accuracy over time – we call this the model aging problem – and cope with structural changes affecting sensor networks — we call this the network dynamicity problem. In this context we assume that sensors collect their observations in the form of textual data and periodically send such information to a centralized entity. We also assume that some centralized entity is in charge of training prediction models according to the available sensor observations.

We address these challenges by proposing and analyzing three different approaches that can be used to train machine-learned prediction functions: local, global, and cluster-based. The local approach is the solution commonly used in the literature, where each sensor is considered separately from others to train a specific predictive function. This approach suffers the cold start problem and therefore hardly applies to dynamic sensor networks, where sensors may be continuously added and removed on a daily basis. Moreover, in large and dynamic sensor networks the local approach requires to train and maintain a large amount of different prediction models. To overcome these issues we propose the global and cluster-based approaches, where models are trained on data coming from all the sensors in the network (or groups of similar sensors, in the cluster-based case) to build resilient predictive functions. The global approach provides substantial benefits in terms of reduced complexity and costs. Furthermore, by relying on a single prediction function that is independent from specific sensors, the global approach naturally solves the cold start problem. Moreover, the global approach is expected to be robust with respect to structural changes occurring in sensor networks, thus also addressing the dynamicity problem.

We also tested a cluster-based approach to prove its potential in representing a viable compromise between the local and global approaches. Specifically, the cluster-based approach trains distinct predictive functions for groups of similar sensors, where sensors are clustered according to some similarity metric; depending on the number of clusters, the behavior of this approach resembles the one of the local approach (when a high number of clusters is used) or the behavior of the global one (when few clusters are used). From the experimental evaluation we cannot conclude yet that this approach indeed represents a good compromise, since results are discordant and further work is needed. The contributions of this paper can be summarized as follows:

  • we propose the global and cluster-based approaches for learning vehicle speed prediction functions in large and dynamic sensor networks.

  • driven by three experimental questions, we provide a comprehensive evaluation to assess the effectiveness of the predictive models trained according to the three approaches. The training is conducted by using different state-of-the-art machine learning algorithms on a large, real-world sensors dataset. The dataset covers a time span of 12 months, during which 130 (145) sensors were added (removed) to (from) the network. The evaluation shows that the models created using the global approach represent good solutions when dealing with dynamic sensor networks, as they prove to be accurate and resilient both to model aging and to structural changes in the sensor infrastructure (which, in turn, includes the cold start problem).

  • We release to the scientific community the real-world dataset used to assess our proposals. The dataset originates from 1.3 billion records collected during the whole 2014 by 272 different road traffic sensors deployed in the city of Fortaleza, Brazil. Due to privacy concerns we do not release the original raw data, but a dataset obtained after an aggregation and cleaning process. To the best of our knowledge, this is the largest and richest dataset made publicly available for research on speed prediction in dynamic sensor networks.

The paper is structured as follows: Section 2 reports an overview of the related works dealing with the traffic prediction problem. Section 3 defines our prediction problem and discusses three approaches to solve the problem. Section 4 presents the dataset used in our experiments, as well as the pre-processing steps used to transform the data into a format suitable for speed prediction. Section 5 details the experimental evaluation and discusses the results. Finally, Section 6 draws the final conclusions and sketches potential lines of future research.

Section snippets

Related work

Short-term traffic prediction aims at estimating traffic conditions from few seconds to few hours in the future, based on current and past traffic information. The field has an extensive and longstanding research history that originates in the 1980s in the context of intelligent transportation systems. A comprehensive and recent survey [2] observes how this research area moved from a classical statistical perspective (e.g. ARIMA) to data-driven modeling techniques based on machine learning and

Problem definition

Let S={s1,,sn} be a network of n sensors overseeing the traffic conditions of a specific geographical area. Within a given time interval T, sensors in S produce a collection of observations, where each observation is a triple (tj,sj,xspeed) recording the time tjT of the event of a vehicle passing by some sensor sjS with a speed xspeed.

Let us then denote by O the set of average speed observations that are produced as follows: the whole time interval T is split in time-buckets of fixed length

Dataset preparation

We evaluate the local, global, and cluster-based approaches introduced in Section 3 by means of a real-world dataset containing data from traffic sensors deployed in the city of Fortaleza (Brazil). The dataset is provided by Autarquia Municipal de Trânsito e Cidadania (AMC), the authority supervising Fortaleza’s road-network. The raw dataset consists of about 1.3 billions records, collected by a network of 302 sensors during the whole year of 2014, for a total of 60 GB of data. Each record is

Experimental evaluation

In this section we discuss the experiments conducted to generate different prediction models and assess their performance. More specifically, Section 5.1 introduces the experimental setting used to conduct the results evaluation, while Section 5.2 introduces the experimental questions and discusses the results.

Conclusion and future work

Traffic forecasts should be accurate and robust to changes in traffic monitoring networks. When such changes may occur, traffic management systems should optimize management and advisory strategies to enhance decision-making capabilities and maintain an appropriate level of service. In this context we consider the problem of predicting the speed of vehicles by analyzing data collected from a large and dynamic network of sensors, where sensing devices are continuously added and removed to the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is partially supported by FUNCAP SPU 8789771/ 2017, UFC-FASTEF 31/2019, BIGDATAGRAPES (EU H2020 RIA, grant agreement N780751), MASTER (H2020, MSCA grant agreement 777695) and the OK-INSAID (MIUR-PON 2018, grant agreement NARS01_00917) projects. F. Lettich’s work has been supported by a University of Alberta’s Faculty of Science Research Grant.

References (35)

  • HuangW. et al.

    Deep architecture for traffic flow prediction: deep belief networks with multitask learning

    IEEE Trans. Intell. Transp. Syst.

    (2014)
  • LippiM. et al.

    Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning

    IEEE Trans. Intell. Transp. Syst.

    (2013)
  • WangD. et al.

    Traffic flow forecast with urban transport network

  • LvY. et al.

    Traffic flow prediction with big data: a deep learning approach

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • Voort Van DerM. et al.

    Combining kohonen maps with arima time series models to forecast traffic flow

    Transp. Res. C

    (1996)
  • SunS. et al.

    A bayesian network approach to traffic flow forecasting

    IEEE Trans. Intell. Transp. Syst.

    (2006)
  • ShuaiM. et al.

    An online approach based on locally weighted learning for short-term traffic flow prediction

  • Cited by (9)

    • Machine learning for spatial analyses in urban areas: a scoping review

      2022, Sustainable Cities and Society
      Citation Excerpt :

      More broadly, Oke et al. (2019) studied urban typologies based on different urban dimensions to investigate the relationships between mobility and environmental sustainability. Most studies (13/31 papers) analyzed traffic characteristics for predicting traffic speed (Ma et al., 2017; Magalhaes et al., 2021), traffic congestion spots (Awan et al., 2021; Majumdar et al., 2021; Qin et al., 2020; Saldana-Perez et al., 2019), traffic flows (Moretti et al., 2015) and traffic flow in relation to air vehicle emissions (Alam et al., 2018; Nyhan et al., 2016), commuting patterns between cities (Spadon et al., 2019), and driving distance in relation to the built environment and demographic (Ding et al., and Næss (2018)). When studying road accidents and events, studies looked at how to predict short-term car crashes (Bao et al., 2019) or studied a way to detect traffic-related events (Alomari et al., 2021).

    View all citing articles on Scopus
    View full text