Courier: Reimagining How We Send Push Notifications

Everything you need to know about Pusher, our dedicated service for sending push notifications.

Courier: Reimagining How We Send Push Notifications

By Deepanshu


Push notifications are an integral part of the Gojek experience. We send around 100 million push notifications every day across all applications. They are used to deliver important and time-sensitive messages to our users like order updates, chat messages, or prompt the user to take an urgent action. Gojek has its own dedicated service, named Pusher, for sending push notifications to all mobile apps like Gojek, GoBiz, GoPartner, and GoPay.

In this blog, we will talk about the traditional architecture of Pusher and how we redesigned it to be super-reliable.

Motivation

With the rapid growth in the Gojek user base i.e., customers, merchant partners, and driver partners, it has become very crucial to have a reliable mechanism for sending push notifications. The traditional way of sending them via Firebase Cloud Messaging(FCM) for Android and Apple Push Notification Service(APNs) for iOS isn’t sufficient anymore. So we needed a new way of sending push notifications which is quick and reliable.

Use cases

  • Customer order updates
    GoFood & GoRide ordering flow heavily relies on push notifications for sending order updates to customers in real time.
  • Merchant Ordering flow
    For the merchant app, a push notification is triggered for every new order and further order updates.
  • Driver Bids
    GoFood & GoRide order bids are sent to the driver app using push notifications.
  • Chat messages
    All new chat messages are sent as push notifications to the customer app when the user is offline(app in the background).
  • Campaigns
    Campaign notifications for an ongoing promotion are also sent as push notifications.

Traditional architecture

Traditionally, the flow of sending push notifications looked like this —

  1. All Gojek services make an HTTP request to Pusher to send push notifications to the mobile apps.
  2. Pusher makes a request to the token service to get details of all registered devices for a user.
  3. Pusher enqueues a job on Kafka, which is picked up by one of the workers, and a push notification is triggered via FCM/APNs based on the OS type of user.

But there is one problem with this setup — low reliability. The delivery rate for push notifications was in the window of 75–85% across all user segments i.e., driver, customer, merchant.

The case for Courier

Courier is Gojek’s in-house solution, built for creating and maintaining long-running persistent connections between mobile apps and servers using the MQTT protocol. After taking upon ourselves the challenges of handling a variety of use cases ranging from sending food order updates to chat messages via Courier, we moved on to solving one more. This time it was about delivering Push Notifications via our own information superhighway.

Introducing Courier — The Information Superhighway Between Mobile & Server — 5 min read
Here’s how we built Courier, a persisting connection through which we’re able to push content from our server to the…

Design Decisions

Let’s discuss a few design decisions we made along the way.

  • Quality of Service(QoS) level
  • Fallback mechanism
  • Topic Structure

Single Events Source

The first decision was to choose the right QoS for this task. QoS 0 is easy to implement but does not have an acknowledgment flow. QoS 1 retries publishing the message till an acknowledgment is received, this means there could be duplication. In QoS 2, the sender and receiver engage in a two-level handshake to ensure only one copy of the message is received (assured delivery).

We went ahead with QoS 1 as it achieved a middle-ground between QoS 0 and QoS 2 in terms of acknowledgment and lower latency. To handle the duplication issues that QoS 1 presents, we built a deduplication logic on the client side, based on a unique push notification identifier.

Fallback Mechanism

Next, we had to build a fallback mechanism when a client is offline(not connected to Courier) or not rolled out so we could send the push notification via FCM/APNs.

For maintaining a client’s online/offline status, we used VerneMQ events to create a user-liveness service, which is actually a Redis-based store, maintaining the online-offline status of a user. For a user logged in with multiple devices, this store maintains the status of each device.

Similarly, we created another Redis-based cache using VerneMQ events, for maintaining the subscribed topics of a client. This helps us decide whether the client is rolled out or not.

There can be cases when a user rolled out and also marked online in the user-liveness store but is not actually connected. In this case, we again fallback to FCM/APN when acknowledgment is not received within a particular interval.

Topic Structure

Courier uses MQTT as the underlying protocol which is based on Publish-Subscribe model. So we had to decide the structure of the topic to which clients would subscribe and Pusher would publish.

We created the topic using details like owner type and owner id to differentiate between users. We also included Android Device ID to uniquely identify a particular device in case same user is logged in from multiple devices.

New architecture

The new flow looks like this:

  1. All gojek services make an HTTP request to Pusher to send push notifications to the mobile apps.
  2. Pusher makes a request to the token service to get details of all connected devices for a user.
  3. Pusher checks the subscription cache if a user has subscribed to the push notification topic and then makes a request to user liveness service to get online/offline status for the user.
  4. If the user is currently online and has subscribed to the push notification topic, Pusher publishes the push notification on the subscribed topic via Courier and enqueues a fallback job to Kafka. If the push notification status is not marked as delivered in PN Delivery cache within a specific interval, the fallback job sends the push notification via the old FCM/APNs flow.
  5. When the user is either offline or has not subscribed to the push notification topic, Pusher directly sends the push notification via the old FCM/APNs flow.

Outcomes

After the integration with Courier, we were able to see a significant increase in the reliability of push notifications.

For the Gojek customer app, we observed a delivery rate of 95%+ for Courier flow while for FCM/APNs it was ~83%.

For the Gojek driver & merchant apps, the delivery rate increased to 99%+ for Courier flow while it was ~90% for FCM/APNs flow.

Next steps

With push notifications out of the way, stay tuned for the journey to integrate with Driver Bids at Gojek, the pulse of all we do.

To read more stories from our vault, click here.

Check out open job positions here.