Introducing Litmus: GOJEK’s Own Experimentation Platform

An overview of how we did feature rollouts at GOJEK, and the framework we built to make it better.

Introducing Litmus: GOJEK’s Own Experimentation Platform

By Riteek Srivastav

The GOJEK app has been downloaded over 130 million times in Indonesia, and millions use it every day. In order to cater to the needs of our users, we release new features at a rapid pace. However, when you have such a massive audience base, it becomes difficult to gauge what new features will be useful and make genuine impact. So how do we approach the problem?

As Steve Jobs famously said, “People don’t know what they want until you show it to them.”

The best way to approach this then is to release multiple variations of a feature and then decide what works based on the data collected. In the software world, this is what we call experimentation or A/B testing.

To help us do this, we built an experimentation platform.

TL;DR
This post provides an overview of Litmus, GOJEK’s experimentation platform for quick feature tests and roll-out to production.

Our earlier approach

Initially, we used to categorise users into segments. If you want to roll out a feature for some users, you had to put them into a particular segment, and gradually add/remove users to it depending on the feedback from the experiment. For e.g. creating a segment for alpha users for feature rollout only to the internal alpha users.


The segmentation service stores the mapping of users to segments. Internal services and the consumer app hit this service with a userID and it will return all the segments for that user.

But, what are these segments? 🤔

Segments can be visualised as a tag present on userID. In the consumer app context, whenever a feature is rolled out, the product team can choose to roll it out to ‘pre-tagged’ segments of users like power, medium, and low users of that product. (These segments can be defined depending on a person’s behaviour on the GOJEK app).

This approach did cause some problems, though:

  1. Adding and removing users to segments: Whenever we release a feature, the general convention is that we roll it out to an increasing percentage of users while monitoring crashes and conversions at every phase. The problem here is that we need to manually add additional users to the respective segments at each of the above rollout phases, which is not very efficient.
  2. Roll out to 100% users: Say after x% of the rollout with a segment we are confident that the feature needs to be rolled out to 100% users. Here, we have two options: either we put all the existing users to that particular segment and keep adding new users to that (which isn’t a great approach for a large user base) or just get rid of the segmentation check from the consumer app side. If we go with the second approach, it will require a fresh app release on the store and usually, users choose to remain on older versions of the app until the version isn’t supported anymore.
  3. Dynamic Segments: What if someone wants to run an experiment which will depend on the user location? The existing architecture did not support such dynamic segments based on location, time zone etc.

In order to get rid of these problems, create faster feature tests with experiments in production, and create a seamless rollout experience, we created an experimentation platform.

Meet Litmus 🖖

Litmus is an experimentation platform (A/B or multivariate testing) for GOJEK.

It has two entities: Experiment and Release.

Teams use Experiment when they want to experiment on a feature with multiple variants. For example, in the GOJEK app, we wanted to experiment with how many products to show on the homepage. That experiment looked something like this.

For any such feature as an experiment, there are three phases, ‘pre-experiment’, ‘during-experiment’, ‘post-experiment’.

The pre-experiment phase includes creating an experiment with data like No. of variants for that experiment, an initial value of traffic percent for the experiment and its variants.

The during-experiment phase includes testing with the different variants of feature and store data relevant to this experiment.

In post-experiment phase, experiment author will use the data collected during his experiment as feedback and decide on which variants of the experiment should be rolled out as a feature.

Teams use Release when they just want to release any feature gradually. One recent example where this was used was the gradual rollout of customer-to-customer chat to all our users.

Both the entities support rule and traffic percent. Teams can define rules like users in Jakarta and app version >= 3.7 or user is power gofood user or power goride user for their experiment or release. The user will be part of the release/experiment only if they fit the rules.

Teams can also update the rules of their experiments or releases if they want to relax some or add more constraints. Traffic percent is used to control the experiment/release traffic. (It is defined in x% of the total GOJEK users).

Clients (any microservice/mobile app) make a request to Litmus with userID and some other information (which will be used in evaluating the rule) like app version , device os , location etc. The request for the user with some-user-id will be something like:

Fig 1: Litmus flow diagram

The response from Litmus will be along these lines:

The properties in the response will be used to decide what variation of a particular experiment needs to be shown to the user. In the case of the above example, the booking button will be shown at the bottom, and a discount card will be shown.

Now, back to those issues we mentioned earlier. Here’s how Litmus deals with them:

1. Adding and removing the users to the segments: Feature rollout can be directly governed by the traffic percent and rule. When we want to increase the rollout, we can increase the traffic percent, no need to do any manual addition of users from a segment.

2. Roll out to 100% users: Same as above, for rolling out a feature to 100%, we can just make the traffic percent value = 100. There is no need for fresh app release.

3. Dynamic Segments: As explained above, the experiment/release supports rule . We can define a rule for a particular location or time for an experiment/release.

This was a very basic overview of how we designed Litmus and what it does. Watch this space for more updates about the cool things we do with it.

Thanks for reading!

We’re always building interesting things at GOJEK. Some help our customers directly, others are geared towards our 2 million+ driver partners. Many others, like Litmus, aim to make life easier for our lean team of engineers. That said, we are expanding that team. If you want to join us and build interesting things, visit gojek.jobs and apply. See you on the other side. 🙌

gojek.jobs

Thanks to Abhishek Singh, Xiao Hanyu, Rajendra Rusmana, Viney Kaushik, Muhammad Abduh, and Muhammad Reza Irvanda.