Database Optimisation on Cassandra
An account of how running into problems helped us learn more about Cassandra and improve response times.
By Deepshi Garg
We handle a lot of orders at Gojek, processing upwards of a 100 million transactions a month. In a SuperApp ecosystem of this scale, it is not unnatural that thousands of users per minute request to see all their completed/cancelled bookings over a two month period.
That… is a lot of data 😅
In order to deal with this load, we needed a database mechanism which would handle very high amounts of data with increased write efficiency. We could compromise a bit on read latencies (because no one wants history urgently). Going by CAP Theorem, high availability was more preferred — we could afford eventual consistency. With these factors in mind, we made the switch from Postgres to Cassandra.
This post is an account of how we matured our learnings about Cassandra amidst all the problems we faced.
Modelling the schema
We needed a schema to store a mapping of customer_ids
or driver_ids
to booking_details
. Going by what we would have done if it had been a RDBMS like Postgres, we normalised the store. This was done keeping in mind that we could serve the booking history for drivers as well as customers, without having to duplicate the booking_details
column. A simplified form of our schema goes like:
Table1 : Customer_booking_mapping
CREATE TABLE customer_booking_index (
customer_id,
booking_creation_time
booking_number,
...
...
PRIMARY KEY (customer_id, booking_creation_time)
) WITH CLUSTERING ORDER BY (customer_id, booking_creation_time DESC);
Table2 : Driver_booking_mapping
// similar table for drivers
Table3 : Booking_details
CREATE TABLE order_details (
booking_number,
booking_details,
...
...
PRIMARY KEY (booking_number)
) WITH CLUSTERING ORDER BY (booking_number);
To fetch the last n
bookings for a customer, this schema would essentially make two DB calls (nope, joins in Cassandra are not recommended for large tables). We know the customer_id
, so the first call will go directly to the node where the data for that customer is. But all the booking_numbers
for a customer can be in multiple nodes, leading to a scattered DB call. This would fail the purpose of a distributed data store, and give very high read latency.
So we had to denormalise the schema and duplicate the bookings data into two tables : one each for drivers and customers.
Table1 : Customer_booking_details
CREATE TABLE customer_booking_details (
customer_id,
booking_creation_time,
booking_number,
booking_detail,
...
...
PRIMARY KEY (customer_id, booking_creation_time)
) WITH CLUSTERING ORDER BY (booking_creation_time DESC, booking_number);
Table2 : Driver_booking_details
// similar table for drivers
Lesson: UNLEARN RDBMS. Modelling a non-relational DB is completely different from a relational DB.
Compaction Strategy
One important reason for us to choose Cassandra was that it provides the TTL feature, where we can configure data to be deleted after a certain period of time. But this is a soft delete, a periodical maintenance process is required to clean out the memory completely.
By default, Cassandra uses SizeTiered Compaction Strategy. This means, writes in Cassandra would happen in blocks of memory fragmented by size. Periodic maintenance tasks were required to clean these memory blocks. It had to iterate over all the data in each block, to see what needs to be deleted. The result was a tedious job that took 6–8 hours every week, with high CPU load and thus, slow response from the DB.
We went through many books and blogs, and discovered the Time Window Compaction Strategy would work the best for us. Here, data is fragmented in time series rather than size-based blocks. This means, all the data in a particular memory block was created in a given time frame. Since we wish to delete all the bookings based on their creation_time
, we could afford deleting entire blocks at once. This meant no maintenance jobs were required. All we had to do was to configure the Time Window efficiently.
One shortcoming here was that once data is written, we could not edit it (editing would create an entry in the block for the new time frame, thus complicating the deletion again.) But that was fine for our use case, because a completed booking is a frozen entity.
This motivated us to look into other tweakable configs as well, like what should be our replication factor and consistency level, how many nodes to use, how much RAM to be put in each node, etc.
Lesson : Configure your DB according to your use case. Do not rely on default settings.
Adding Features
As we optimised things, there came a new feature request — to integrate driver feedback with this service. This required us to serve unrated bookings, fetch driver ratings for a booking from customer, and show these ratings along with the booking history. We could redesign the schema again to incorporate this, but now our booking would not be a frozen entity. It would require an update with driver rating.
With all the load testing and spikes, we eventually discovered that although every write query in Cassandra is an UPSERT
, updating selective columns is more expensive than inserting new rows (or updating the whole row). This was mainly because it is a column-oriented store. This meant, all the columns for a primary key are not stored directly against it. To update a column for a given combination of primary key, it has to go through large amounts of data — leading to lock in the table.
We could not afford our main bookings table to be locked for long durations. It would be fine if we were to rewrite the whole row again, but updating only some columns was not viable. This is so because new rows (or full row updates) directly store a new row in the DB, and wait for maintenance job to clean out the old row.
And thus, we had to create an entirely new table to store feedback.
Table3 : Customer_feedback_details
CREATE TABLE customer_feedback_details (
customer_id,
feedback_creation_time,
booking_number,
feedback_detail,
...
...
PRIMARY KEY (customer_id, feedback_creation_time, booking_number)
) WITH CLUSTERING ORDER BY (feedback_creation_time DESC, booking_number);
Table4 : Driver_feedback_details
// similar table for drivers
Lesson : Try to have as few update queries as possible. Cassandra is more suitable for frozen DB entries, or updates where the whole row is replaced.
Results
Although it was a roller coaster ride, it was all worth it. Yes, we had to redo the schema multiple times, yes we had to embrace denormalisation and data duplication, but what we finally achieved was a stable system with awesome performance. Our current system now serves writes within 1ms and reads within 10ms. It did require multiple iterations of tuning the system, but we can now serve much higher amounts of data (4 months of completed bookings) with very low response time.
Oh and one more thing, because Cassandra uses a distributed architecture, it is very easy to add a node or tackle a failing node in times of need.
That’s all for this post. We’ll be sharing more insights as we continue to build and learn, so keep watching this space for more. Or better yet, sign up for our newsletter to get our latest updates beamed straight to your inbox.