Polina is a speaker for ODSC East 2020 this April 13-17 in Boston. Be sure to check out her talk, “Utilizing Machine Learning to Predict Public Transportation Arrival Times,” there!

For the Houston, TX, public transit authority METRO, predicting accurate bus arrival times is critical to ensure service quality and customer satisfaction. To improve their service and incorporate new technologies, METRO partnered with EastBanc Technologies, a custom software development firm. During my ODSC East 2020 session, I’ll walk you through a real-life example of a Data Science project, starting from getting to know the data, framing a problem as a machine learning model, model optimization techniques and, finally, blending the best model into a production solution. 

First, we’ll start with an overview of the data (structured and unstructured) that was provided to our team. We’ll discuss the incoming data format, the way it is corrected, and assigned to a bus route. After discussing data specifics, we’ll dig into possible ways to frame our prediction problem as a machine learning task. We chose to harness the power of long short-term memory (LSTM) artificial recurrent neural network architecture based on its proven ability and highly accurate performance record when utilized on a series of data observations.

When using a series of data observations to predict public transport arrival times, it is common to enrich the training data with external information such as surrounding traffic, maps, and city structures. However, our team faced complicated challenges, including limited access to structured data of such nature. For that reason, our team decided to try alternative data sources to leverage seemingly less related, but easily accessible data (i.e. weather information, social activity data, city events) to improve predictions. Whenever you use external datasets with neural networks, it is always an art to find the best way to incorporate these datasets in a way that allows you to maximize their predictive power. Join my session to learn how our team assessed the predictive power of each dataset, and how we incorporated them into improving the LSTM model.

Teaser: How much of an impact will external data, such as weather and city events, have?  Which one will help the most? Join us to learn! 


Polina Reshetova, PhD and Senior Data Scientist, EastBanc Technologies

Polina Reshetova, a Senior Data Scientist at EastBanc Technologies, earned her PhD in complex systems data analysis. Over the past 5 years, she has been developing machine learning algorithms and predictive analytical techniques.