Building a single source of true data for predictive applications on AWS
Coordinating diverse real-time data sources
The rail operator was having a problem linking its multitude of data sources and coordinating its various data sources, which meant that train station destination boards did not sync with information shown on the passenger app, for example.
Four years ago, the rail operator decided it needed to develop a unified platform for travel information. At the center of the system it wanted to use Big Data technology to harvest data from existing systems. The plan was to create a central data platform to act as the “Single Point of Truth” (SPOT). From this platform, data is distributed to all connected channels to ensure that passengers get accurate and consistent information on their journeys.
In order to build the platform, a team of the rail operator’s engineers and external service providers was pulled together – including those from The unbelievable Machine Company (*um). The biggest challenge they faced was the rail company’s highly complex, safety-critical environment.
16 million stop events every day
Each and every day, around 16 million stop events occur in the rail operator’s network, taking place when a train stops and starts somewhere on the rail network. In addition, there are 1.2 million train-running messages and 4.4 million train-location messages, all of which follow an international standard protocol. Before a train enters a station, for example, the right platform has to be confirmed.
Due to the enormous volumes of data involved, plus the capacity requirements for near-real-time data and access requirements from different locations, it was obvious to the team that it would need to develop a new application operating in the cloud.
Another major challenge was location. Trains are not all clearly marked, and a GPS signal is only available for a few models. This makes it difficult to identify which train is at which point on the journey, which made it extremely complex to link the data to the SPOT. Eventually, the team chose to use machine learning (ML) to identify trains when they are parting or merging carriages, for example.
There were also some very complex data interfaces that proved a challenge. Protocols and data formats from proprietary solutions were partly obsolete, which made consolidation even more difficult. At the same time, technological consistency was also important. By implementing an automated CI/CD pipeline (Continuous Integration and Continuous Delivery), system interruptions or unavailability were virtually eliminated.
Deploying the single point of truth
The primary task was to implement the SPOT, which distributes the information consistently across all of the rail operator’s information channels and touchpoints.
The solution is built on Big Data technologies sitting on top of Amazon Web Services (AWS). The rail operator had already made the decision to go with AWS, as its infrastructure and services met its specific requirements. In addition to the AWS Cloud, the rail operator used additional AWS and Hadoop stack technologies to create a data lake, which would store huge amounts of data in the cloud.
The team started by developing the system architecture and a timetable builder based on it. This generates a complete target timetable, combining the customer timetable and the operating timetable. This target timetable is used to create short-term timetable changes as well as real-time data such as train position messages from the track sensors.
Microservices are used to consolidate data from various sources, evaluate the information and then stream it consistently to information channels, such as platform displays and kiosk systems at stations or the rail operator’s navigator.
Synchronizing with display boards
The rail operator is carrying out an initial pilot of the service at a limited number of stations. The travel information in the rail operator’s mobile app will eventually synchronize with railway stations’ display boards across its region, ensuring rail customers have the correct information at all times. The project is ongoing, with plans to add further features to the app in the future.
The rail operator’s goal is to create a future-proof traveler information system utilizing innovative systems and agile methods.
Eventually, all of its customers will have 24/7 consistent and reliable information on all the rail services it operates across the region via its app. This Improvement in quality and accuracy of information distributed to the rail operator’s customers is crucial for its future success and validation.
Our multicloud management service allows you to select the clouds that best serve your needs. To discover how cloud can provide resources in a fast, flexible and efficient way, download the guide: Accelerate Your Time to Market.