Hello there. It's my first blog post in English and I'm going to tell you how we built simple in-memory storage for animated cars. We show animated cars on the main screen of "Namba Taxi for clients" applications. This post is about the completed journey, algorithms and not about Go.
The beginning
The story begins in 2015 with the graduation work of our mobile developer. The work topic was 'Driver application for taxi service'. In the application, he animated the car. It looked like this.
We thought. Why not using this on the screen when the client can track driver location. And our first challenge was a lack of data. We get new driver location every 15 seconds. We can't just decrease update interval, because when driver mobile application sends data to us it gets the current order, next order and if there are any open alerts. Alert works like SOS button. When the driver presses it other drivers are in hurry to help him. And when we decrease update interval we can get more traffic to our systems. We weren't sure that we can handle it.
First steps
Our first try was simple and stupid:
- Make the request and save coordinates.
- Make another request and animate the car.
As you can guess, there were problems with it. We can't animate a car properly and it moves through fields, forests, lakes and quarters. It looks like this
As the solution of the problem, we used OpenStreetMap Routing Machine (OSRM) for building routes and our algorithm improved. We have the same timeout.
- We make a request.
- Save coordinates.
- Send saved coordinates to the backend.
- Build route via OSRM.
- Return it to client app and animate marker.
It seems that it works now, but we faced with different problem with one-way roads
For instance, driver stays at the intersection at red point. But his device has bad accuracy and on location update driver stay on the opposite side of the crossroad. At client app we get these coordinates, save them and send to the backend. OSRM build a legit route and returns to the app and it looks ridiculous. Because marker moves very fast.
We solved this problem in a naive way. We check the shortest distance between two points and not build a route for distances less than 20 meters. With that algorithm after few days of testing, we decided to release our application and get feedback from customers.
And you know, we live in not ideal world so we decided to go to second iteration, because several things needed improvements.
- First thing was a trip cost calculator. All calculations were on the driver side. In this case we save server resources because we don't send useless requests. On the another hand we have only one trust point. It's a driver's mobile app. So we need to duplicate data and save It on the server side.
- Furthermore, we realized that 1 track per 15 seconds it's so little and lots of customers don't get sight of new feature because car starts moving In 15 seconds after the screen is opened. I know not so many people, who can look at the screen where nothing is going on for 15 seconds.
- Moreover, we have lots of problems with GPS module on the driver side. GPS problems are associated with driver's device.
- Finally, we wanted animated cars on the main screen.
Several issues to solve
- Start to collect more tracks from drivers
- Show animated cars on the main screen
- Store intermediate trip cost on the server side
- Save mobile data
- Collect each track per one second
I want to tell a few words about saving mobile data. We need it because in our country we have a very cheap taxi receipt. We use taxi like public transport. For instance, you can get from one side of the city to another just for 2 euros. It's like metro in Paris. Also mobile internet cost is too high. If we save 100 bytes per second, then we'll save 2000$ at company scale.
What the data in track?
- Driver location (Latitude, Longitude)
- Driver session that we give upon login
- Order information (OrderID and trip cost)
We decided that one track should be less than 100 bytes. And we started to look transport protocol to solve this issue
As you can see we watched for several protocols:
- HTTP
- WebSockets
- TCP
- UDP
And for us ideal option was UDP, because:
- We send only datagrams
- We don't need guarantees
- Minimalism
- Save lots of data
- We have only 20 bytes overhead
- Not blocked in mobile networks in our country
As for data serialization, we watched on:
- JSON
- MsgPack
- Protobuf
📖 Want to know why Uber's app loads so quickly? Discover the strategic battle between Protocol Buffers and JSON, and how it affects everything from battery life to user experience.
We chose protocol buffers because it's so efficient on little data.
And as you can see the nearest competitor is heavier twice.
What do we have in total?
- We have 42 bytes of payload
- + 20 bytes of IP headers
- = 62 bytes per track.
- PROFIT.
But when we get data also we need to store it, right?
Data storage
We need to store this data:
- Driver's session to identify driver
- Cab number to perform search for client's mobile application
- Order id and trip cost to save trip data from driver's mobile application on server
- Last location to perform search
- N last locations to build route
Used storages
- Percona — to store all data. We store drivers, orders, fares and everything.
- Redis as key-value storage for caching purposes.
- Elasticsearch for geocoding
As mentioned above we have 600 online drivers and using these storages to save data seems non-expedient. So we need geo index
We watched for two geo indices:
- KD-tree
- R-tree.
We had requirements for geo index:
- We need to search N nearest points.
- We need a balanced tree to provide the best search in the worst scenario
KD-tree
And KD-tree doesn't fit our needs because it's unbalanced and can search only one nearest point. We can implement k-nearest neighbors over kd-tree, but we won't reinvent the wheel because R-tree already solves this issue.
R-tree
It looks like this. We can perform search N nearest points and it's balanced tree. We chose this one
You can get it there for Go programming language. https://github.com/dhconnelly/rtreego.
Also we need an expire mechanism because we need to invalidate drivers and show actual information to operators. For instance, remove driver via 900 seconds of inactivity. And also we need LRU data structure for storing last locations. Because we initialize storage for N items. When we try to add an item, when storage already filled. We remove the least recently used item and we add a new one. So here is our storage architecture
- We store all data in-memory.
- We use R-tree to perform a search of nearest drivers.
- Also we use two maps for storing drivers and perform search by cab number or by session
Final algorithm
Here is the final algorithm on the backend:
- Get data by UDP
- Try to get driver from storage
- If doesn't exist — get driver from redis
- Check and validate data
- Set driver to storage
- If doesn't exist — initialize LRU
- Update r-tree
HTTP endpoints
We implemented these endpoints to integrate to our systems
- Return nearest drivers
- Remove driver from storage(by cab number or session)
- Get information about trip
- Get information about driver
Conclusion
In the end, I want to give these conclusions that we used in our backend system:
- UDP+Protobuf for data savings
- In-memory storage
- R-tree for nearest drivers
- LRU cache for storing last locations
- OSRM for map matching and building routes
As an example, you can watch here. It's too simple but implements lots of features described in the article.