A Smart Pipeline for Merchant Data

How the adoption of event-driven architecture helped GOJEK streamline ingestion of merchant data for its products.

A Smart Pipeline for Merchant Data

By Sukreet Roy Choudhury

GO-PAY has been a game changer in Indonesia, accounting for about three-quarters of mobile payments in the country. It is the payment solution of choice for most Indonesians, and is supported by around ~300,000 merchants.

In order to improve user convenience, it was important that GO-PAY not only be the best online payment method, but find favour offline as well.

This is where ‘Nearby’ comes in.

Nearby allows GOJEK users to discover GO-PAY merchants in the vicinity of their current location, and the promotions running in the store.

On paper, this sounds fairly easy, but like any other product, it had its own share of complications. The Nearby team, which I am part of, solved most of the technical challenges on the road to building a stable product. However, an even bigger challenge was posed by a basic operational component — sourcing merchant information in one place.

This post tackles how a solution originally envisioned to solve a problem for one team eventually helped streamline merchant data management across GOJEK.

Why was this a problem?

GOJEK has many products and until the recent past, these products (e.g GO-FOOD, GO-PAY) had their own merchant information systems, where one needed to onboard a merchant. Even though we had an API for creating and updating merchants in the system, the problem was accumulating the data.

Here’s some context. When we started, we on-boarded ~30K merchants to Nearby manually, thanks to the efforts of content managers who collected data from different teams’ systems in the form of CSV documents and uploaded it to the system. While it worked, one thing was obvious, the same thing that sets so many stories in motion at GOJEK:

The solution was not scalable.

We needed a solution that aligned with GOJEK’s theme of eliminating human bottlenecks in order to improve operational efficiency. We also happened to have one in our own backyard.

Here’s a flashback:

In 2017, GOJEK acquired three fintech firms, one of which was Midtrans — an online payments company in Indonesia. Midtrans owned a Central Merchants Portal (CMP), and had a fair bit of experience with onboarding, payouts, and reporting. Post the acquisition, GOJEK also started using CMP, which would serve as a central repository for all the merchants on-boarded to the GOJEK platform (across products).

Now, CMP is supposed to be the single source of truth for all merchants in GOJEK, and it made logical sense to integrate with it instead of integrating with different products separately.

A simple way to integrate would be asking CMP to make a call to Nearby every time there is a merchant event, so that the data is in sync between the two systems (similar to observer pattern).

However, the problem with this approach is that every time there is a new GOJEK service interested in the CMP data it would require additional work on both sides:

1. API call on CMP side to sync up the data with the new service
2. New service on GOJEK side to back fill existing merchant data

Hence, we wanted to solve the Nearby problem in a more generic way — by creating a platform which can be leveraged by other GOJEK product teams interested in CMP data.

The Solution

Instead of making an API call to Nearby service, CMP emits events to kafka.
A worker on GOJEK infra would then consume events from this Kafka topic and push to Nearby service.

Now, if a new service wishes to consume the CMP data, there is zero effort required on the CMP side. All we need to do is spin up a new worker with a different consumer group, that will push data to the new service.

Now, Nearby also needs information on promotions and vouchers, and this information lies with different systems within GOJEK. As in the case of merchant information, we followed an event-driven architecture to sync information from these systems as well.

The promotion systems will publish all events (creates and updates) to a Kafka topic for promotions. The voucher systems does the same to a topic for vouchers. We have a worker listening to each of the topics, and updating the information on Nearby,

Image for post

This solution is an example of how event-driven architecture can solve the kind of problems GOJEK faced with regards to merchant data. A similar approach can be used in situations where a team or an organisation is interested in data populated by a third party. The event store (Kafka in our case) will contain all historic data, hence backfilling for any new consumer is also not a problem.

No, GOJEK is not just a transport company. Our products include payments, lifestyle services, movie tickets and so much more. Transport is the tip of a very large iceberg, and we need more cool people (😋) to maintain our growth at scale. Are you chilled out, and not the kind who melts under pressure? We may have a job for you. Check out gojek.jobs to know more. 🙌