Name: best-trip-recommender
Owner: Laboratório Analytics
Description: Bus Trip Recommendation API implemented in R, using bus location, ticketing and schedule data.
Created: 2016-11-01 14:46:02.0
Updated: 2016-11-03 12:47:32.0
Pushed: 2017-05-25 19:04:23.0
Homepage: null
Size: 13633
Language: R
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The Recommendation Service Manager holds the procedure of paralleling multiple requests to the BestTripRecommender.
Its architecture is structured as shown below:
![](best_trip_recommender/Diagrams/recommendation service manager diagram.png)
The Application makes several requests to the Web Service. This one is responsable for distributing those requests into multiple processes, each one will execute in an individual gate of our Virtual Machine in a parallelized way. These processes receive the requests with the input data and apply our prediction algorithms. Thereunto, the processes need to access our database that contains historical data about buses trips. This data is processed and the output is sent to the application.
The detailed procedure that occurs in each of those processes is described right below.
The Best Trip Recommender is a service that allows you to create a prediction system for the public transportation services of your city. So far the system can predict the duration of a trip and its number of passengers.
The recommender architecture is structured as shown below:
The User Application is an app that provides the input data necessary for the system to generate a prediction and also receives the output data from the recommender.
The Input Data must contain the following content:
The Get N probable trips uses the bus Timetable data to get N trips (N = 3 as default) as base for the prediction. This module filters trips in the same route, week day and stop id passed by the user, then it get the N closest trips of the timetable passed by the user.
The Timetable is a storage that holds succint data of real accomplished bus trips in a predefined amount of time. Based on data from accomplished bus trips (not the data of scheduled trips) through statistical methods we generated this data that contains the mean, inferior limit and superior limit for the time the buses arrived at the bus stops. The recommender will use this data to identify which of the real trips is closest in schedule to the one passed as Input Data. The Timetable data storage should contain these fields:
The Feature Factory module creates new features based on the user data and the N probable trips. Currently it extracts the following features:
After the feature extraction in the Feature Factory module the data already contains the feature necessary to predict the number of passengers and the trip duration.
The Predict Trip Passengers Number predicts the trip number of passengers based on the Prediction Data
The Predict Trip Duration predicts the trip duration based on the Prediction Data
The Prediction Data contains all the historical data about accomplished bus trips in a certain period of time. Differently of Timetable data, this one does not contain succint data of the trips. It contains data of all the trips made, each one individually. The fields for the data are:
The Output Data contains the N probable trips, each one with the following content:
on run_api.py <num_processes> <method: lasso|svm> <best_trip_recommender_folderpath> <training_data_filepath> <test_metadata_filepath> <model_data_filepath>
Where:
lhost:<port>/train_model
ritiba examples
lhost:<port>/get_best_trips?route=022&time=10:00:00&date=2016-10-26&bus_stop_id=26276
lhost:<port>/get_best_trips?route=507&time=17:00:00&date=2017-02-04&bus_stop_id=26255
mpina Grande example
lhost:<port>/get_best_trips?route=0500&time=14:23:00&date=2016-09-01&bus_stop_id=97