Authors: Praveen Prashant, Kelvin Heng, and Deepesh Naini

How we built a ML Driven voucher allocation engine to serve millions of customers across multiple geographies.

The Idea 💡

How can we use different vouchers to get more business from our customer base while keeping our costs low?

Achieving uplift in business objectives through vouchers

Gojek uses vouchers to achieve multiple business objectives.

For example, the objective can be to maximise food orders in Indonesia for a given week, while in another week it could be to maximise resurrection of churned users in Singapore

An ML Driven Multi-Objective Solution 💻 📊

Identifying the Persuadables

Types of users

The Persuadables: customers who make more transactions than if they were not targeted with the voucher ✅
The Sure Things: customers who make the same number of transactions when targeted/ not targeted (i.e. zero incremental response) 😐
The Lost Causes: customers who do not make transactions irrespective of whether or not they are targeted (i.e. zero incremental response) 💀
The Do Not Disturbs: customers who make less transactions because they were targeted 🔻

We use historical data of customers to observe past effects of vouchers on them. This is a typical causal inference problem to measure effect (for example, incremental transactions) of treatment (voucher) for our customers.

However, the complexity increases since there are two response variables, predicted uplift in business objective and cost. So, we use a deep-learning based causal inference algorithm to produce both predictions simultaneously for all customers given a voucher. The “objective” in the problem formulation depends on the business use case (multi-objective).

These predictions are then fed into a knapsack optimiser to recommend treatments (vouchers) for each customer to maximise the business objective while adhering to the budget constraint. We chose the simple knapsack optimiser because of its fast processing, since there are millions of predictions.

Our causal inference engine provides us predictions of uplift and cost. The predictions are then fed into an optimizer

Optimizer: achieve maximum increase in business objective given a budget constraint

Building At Scale 📈

Data Transformation: How we leverage dbt

We use thousands of features for each customer (there are hundreds of millions of them) to predict their transactional behaviour. To extract these features we depend on hundreds of source tables.

To do all this scalably, we use data build tool (dbt) extensively for efficient data transformation. dbt is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively.

dbt helps us proactively monitor the upstream source tables for staleness, so it can be caught before we wrongly ingest into downstream tables.

dbt and its packages (special mention, dbt-expectations) provide tonnes of easy inbuilt tests to ensure data reliability in our feature tables (for example checking nulls in the data). dbt also provides easy support for customised tests.

Other dbt packages like dbt-date are packed with awesome features. For example date manipulations, fuzzy text matching etc. dbt 1.3 and above also support python scripts for advanced data transformation!

Additionally, dbt is a jinja based tool, which means code reusability and standardisation; simplifying code development and maintenance considerably (see dbt macros).

dbt is a wonderful tool!

dbt allows us to test our data sources and tables extensively. We have more than five thousand tests averaging 17 tests per model

Data Observability: How we leverage elementary

We are power users of dbt and elementary. With elementary, it is extremely easy to be on top of any data anomalies or test failures (for example, elementary can notify users of an unexpected increase in % of null rows).

Elementary provides easy reporting of these anomalies and test failures via slack notifications. Elementary also provides a one-stop dashboard for model runtime and detailed errors.

One-Stop Dashboard: We use dbt + Elementary to monitor our table health

On top of this we use CI/CD to ensure proper linting, test coverage, and dry testing of any new SQL code injection.

Dry test SQL code logic before allowing merges

dbt_project_evaluator: CI/CD checks to ensure code quality of dbt project

Configuration Management: How we leverage Hydra

We, as a team, manage voucher allocations for multitudes of on-demand services across multiple geographies. Hence, it is essential to manage customised hyperparameter configurations for these ML models in a scalable way. We use hydra to do this with ease.

hydra

Scalable and Better Code

Other practices that the data science team follows to write scalable code are:

Enforce code testing (pytest)
Track test coverage

Pre-commit hooks for code style and formatting
Gitlab CI automated testing
CI/CD using gitlab pipelines for efficient collaboration and smooth releases

In-house Platforms at Gojek 👨‍🚒

The data science team at Gojek is also fortunate for having access to some world class in-house tools. Some notable mentions are:

Merlin: This machine learning platform truly makes ML model deployments magical! ✨
Campaign Portal: This behemoth engineering platform developed by our extremely skilled engineering team allocates vouchers to millions of customers in minutes! ⏰

Acronyms & Definitions:

ML: Machine Learning
Causal Inference: The process of determining whether an observed association truly reflects a cause-and-effect relationship
Knapsack Optimiser: A type of combinatorial optimisation
CI/CD: Continuous integration and continuous delivery/deployment
SQL: Structured Query Language

How Gojek Allocates Personalised Vouchers At Scale

Authors: Praveen Prashant, Kelvin Heng, and Deepesh Naini

Building At Scale 📈

Acronyms & Definitions:

References: