TRCLC 17-3

Integrating Crowdsourced Data with Traditionally Collected Data to Enhance Estimation of Bicycle Exposure Measure

Authors: Valerian Kwigizile, Jun-Seok Oh, and Keneth Kwayu

Summary: This study explored the potential of incorporating crowdsourced data (specifically Strava Metro data) in estimation methods to improve the spatial-temporal estimation of bicycle exposure. Different probabilistic and machine learning models were tested, including the Negative Binomial (NB) model, Random Forest (RF), Support Vector Machines (SVM), Artificial Neural Network (ANN) and K-Nearest Neighbors (KNN).   

Problem: Measuring bicycle exposure is very important for planning bicycle systems as well as ensuring safety of such systems. Traditional methods for measuring bicycle volume have been proven to be challenging and costly. Crowdsourced data of cycling activities can be a good source of bicycle exposure measure. Data collected using fitness apps have the potential to supplement other data collected through traditional methods to provide spatially detailed data for estimating bicycle exposure. However, comprehensive research on how to integrate crowdsourced data with traditional data is lacking. Understanding opportunities and limitations associated with crowdsourced data is necessary to guide integration of the data.

Research Results:  In terms of prediction, the Random Forest model was found to have a better prediction capability. The addition of Strava counts, which had an average observed penetration rate of 7 percent, improved the RF model significantly by increasing its ability to explain variations in hourly bicycle volume from 65 percent (R-Sqrd = 0.65) to 71 percent (R-Sqrd = 0.71). The study also conducted a simulation study to assess the change in model performance based on different simulated Strava penetration rates and found that a unit change in the percent of simulated Strava penetration rate has a very significant influence on the model’s prediction performance. The research team created an online tool that utilized RF model to predict hourly bicycle volume.