Introducing Clickstream— Gojek's Real-time Analytics Platform


By Anirudh Vyas

First things first. What is Clickstream?

Clickstream data is a trail of digital breadcrumbs left by users as they click their way through a website or mobile app. It is loaded with valuable customer information for businesses, and its analysis and usage has emerged as a powerful data source. So, we thought it was kind of apt to name our in-house real-time event analytics ingestion platform Clickstream!

Why build Clickstream?

The Gojek SuperApp is a crazy highway of varied use cases, which need data to be delivered in real-time to the Gojek backend servers. The main drivers of this initiative were to achieve:

  1. Real-time data
    At Gojek, we’d been using third-party analytics providers, none of which provided data to the Gojek backend in real-time or even remotely close to it. The solutions were either very expensive or highly inefficient. The real-time availability of data serves various purposes like detecting driver fraud, serving ads based on impressions, etc.
  2. Data sanity
    Gojek is a humongous app, with a zillion events. Hence, it’s important to maintain event data sanity. With a team of over 200 enthusiastic developers on iOS and Android, it’s hard to keep a check or define a Standard Operating Procedure (SOP) for adding new keys to an analytic event. So, we wished to bring in type-safe and reusable event schemas that help achieve that level of data sanity.
  3. Reduced data pipeline costs
    With the third party services in place, there is always an overhead of maintaining the data pipelines from the service providers data stores to ours. Sending the data directly to our own data stores reduces the cost of porting the data and maintaining those pipelines.

Clickstream to the rescue! 🦸

How is Clickstream built?

Clickstream on a high level has two parts to it

  • Clickstream Mobile Libraries
  • Clickstream Backend aka Raccoon

Architecture

In this blog, we talk about the mobile side of things and how the libraries are built.

Mobile Library Architecture

The Crucial parts

  • The Event Processor ingests the analytics events messages i.e. protobuf objects generated by the client app, add relevant metadata/attributes to the provided proto and forwards the events messages to the next stage.
  • The Event Scheduler categorises, prioritises and caches the Event objects passed on by the processor. It is also responsible to batch the events together to be sent to the server.
  • The Network Manager is responsible for setting up the required connection to the backend services and forwards the event batches via a retry mechanism. The Network Manager is responsible to handle the socket state and the retries for the failed events.
  • The Configurations for each block are initialised with the library. These configurations allow for fine-grained control over the library behaviour. These configurations define various constraints for the library as well, as in, minimum operational battery percentage, duration between retries.
  • The Contracts have been defined for each event. These contracts are nothing but protocol buffers that help the team define efficient events and ensure the reusability of the bits. More power to “DRY”. These contracts also ensure the type safety of the data being sent.

Few noteworthy traits of Clickstream

  1. Simple and lightweight
  2. Remotely Configurable
  3. Support for real-time data
  4. Multiple QoS support (QoS0 and QoS1)
  5. Typesafe and reusable schemas
  6. Efficient payloads
  7. In-built data aggregation

What makes Clickstream different from off the shelf solutions?

  1. Clickstream via its public API only accepts a Message type and not the conventional dictionary or a map like the other third-party analytical tools. This design allows for enforcing strict schemas, hence making life easier for the PMs and the designers. Using protocol buffers were are able to standardise events across different product groups within Gojek.
  2. Events’ classification (standard/real-time/instant) can be updated on the fly or an event can be dropped altogether simply by updating the remote configurations.
  3. Clickstream collects its own health data. This allows us to gauge the library health as drop rates, event and batch health funnels via the library itself.

What use cases does Clickstream currently solve?

The data being collected with Clickstream is empowering various use cases for different teams; to mention a few:

Fraud detection 🕵️

Fraud detection and commitment is a critical part of Gojek’s commitment to drivers and users safety and ensuring safety is tantamount to Gojek’s existence as a whole. Clickstream is used to send encrypted signals which suggest that the application being used is a genuine one or not.

Merchant Ads use case and impact 🛒

Gojek allows merchants to place their ads at various places within its applications. The merchants pay for the number of clicks on the Ads, impressions/views. There are 2 problems that need to be solved in real-time for the merchants & increase revenue for Gojek:

  • Daily ad-spend capping by merchants based on totals clicks/impressions per day
  • Reconcile billing based on the total clicks and charge the merchant fairly

Impact

  • GoAds registered a significant increase in the revenue booked
  • Powers a total of three Ad products
  • Allows usage of merchant feature which is the daily distribution of impressions and clicks across the campaign duration

What does Clickstream have in store for the future?

We at Gojek want to share the tools we’ve created. So we plan to make Clickstream open-source in the near future. This could serve various organizations big and small to send and receive data in real-time with no real effort or time.

  1. Remote Logging and App Observability
    Gojek mobile applications are constantly evolving and with that, the need to keep the performance of the applications in check is a must. With Clickstream developers would be able to observe certain critical performance metrics in real-time and turn on or off A/B tests, gather early feedback, collect performance metrics etc.
  2. Real-time alerts and user assistance
    Leveraging the real-time capabilities of Clickstream, Gojek’s support team would directly be able to reach out to the affected users and guide/help them with the task they wished to perform but were not due to app performance issues.

I’d like to thank the awesome people involved in making Clickstream a possibility:

Mobile Team: Abhijeet Mallick, Pooja Shukla, Santhosh Kumar Srikanthamurali, Prashant Rane and Raditya Gumay
Backend Team: Chakravarthy Varaga and Rasyid Hakim
Product Team: Rakesh Malayattil and Ravi Suhag

Stay tuned for the next blogs, where we’ll share the journey to build Clickstream mobile libraries, Raccoon, and the learnings! 🕺

Read more stories from the vault, here.

Also, we’re hiring! Check out open job positions by clicking below: