By Li Xiaohong

When our customers want to use our ride hailing products like GoRide and GoCar, they are presented with convenient, clearly named pickup points nearby. Here’s an example:

This saves customers the hassle of calling the driver partner, explaining where they are, what colour clothes they are wearing, and so on. Our pickup points are designed to make lives easier for both customers and driver partners.

This is possible because the pickup points shown on the app are popular pickup locations around the area. What’s more, the pickup point names are displayed exactly how customers driver partners usually refer to them.

But how do we manage to name so many pickup points accurately, and at scale?

We use past booking locations and their associated chat logs to discover named pickup points. As our previous research has explained, we first perform clustering on historical bookings to form potential pickup points, then we use a language model to select the best name. Here, we explain how we improved upon the previous statistical language model with a state-of-the-art NLP model, which makes the entire naming exercise fully scalable. This is the magic behind all the pickup points seen on the Gojek app.

How can we learn better?

As explained in our previous post, our original statistical language model selects the best pickup point name from the most probable n-grams extracted from bookings text. However, such a statistical language model doesn’t ‘understand’ the meaning of the texts, it simply chooses phrases with high frequencies without knowing the semantics. Sometimes it throws street names, sometimes even common phrases with no information about location. We have to manually check everything to make sure it reflects the right POI, before it appears on the app.

This creates a challenge — especially if we want to quickly expand the frictionless pickup experience to customers across in new geographies. Hence, we decided to go a step further with a deep-learning NLP model that ‘understands’ and ‘learns’ to differentiate what is a valid pickup point name.

At Gojek, we never stop thinking and always go a step further

Meet CartoBERT ?

One of the most recent and impactful breakthroughs NLP was the publication of BERT[1] — a contextual language representation with transformer models — by Google in late 2018. It obtained state-of-the-art results on a wide array of NLP tasks. In the 2019, many NLP researches were influenced by BERT, including XLNet, RoBERTa, ERNIE etc.

BERT Explained

BERT, or Bidirectional Encoder Representations from Transformers, is composed of an embedding layer, followed by groups of transformer layers.

Every word (token) in the input sentence will first get encoded into its embedding representations in the embedding layer, and then go through bidirectional transformer encoder layers. Every encoder layer will perform the multi-head attention computation on the token representation from the previous layer to create a new intermediate representation, which is then output to the next layer. The output from the final layer is the contextual representation of the input token. A pooled sentence level representation combining all token representations could be created if needed by specific downstream tasks.

With the final contextual representations at either token or sentence level, a pre-trained BERT on large unlabelled text corpus, could be further extended to a wide variety of NLP tasks, such as text classification, question answering, Named Entity Recognition (NER) etc.

ALBERT[2], published by Google in Sep 2019, improved on BERT with embedding parameter factorisation and cross layer parameter sharing to reduce the number of parameters (by 9 times for base model). It also uses sequence order prediction instead of next sentence prediction for the pre-train task. In the paper, ALBERT also outperforms BERT on standard NLP tasks/datasets (SQUAD, RACE etc), with fewer parameters.

Pre-train CartoBERT to learn language representation from Gojek bookings text

Inspired by ALBERT’s lightweight model and performance, we developed CartoBERT, Gojek’s very own pickup point name recognition model, based on ALBERT’s architecture.

As illustrated below, the uncased CartoBERT is pre-trained on Gojek’s own masked bookings text corpus of about 200 million sentences. Booking text is first pre-processed for data masking to mask all customer sensitive information, language detection, text normalisation (including text cleaning, slang, abbreviation transformations, lowercase transformation and emoji removal). The pre-processed text is used to build subword vocabularies which handles Out-Of-Vocabulary (OOV) tokens that could be decomposed to frequent subword patterns. CartoBERT tokenizer is then created with the subword vocabularies and further used to encode and tokenize the same preprocessed bookings text to form pre-trained input files.

Same as ALBERT, the model is pre-trained to ‘understand’ Gojek’s bookings text using Masked Language Model — which predicts randomly masked tokens in input sentences — and Sentence Order Prediction tasks, which predicts the order of input sentences pair.

Fine-tuning CartoBERT to extract pickup point names from Gojek bookings text

With the huge amount of bookings text we have at Gojek, now CartoBERT can better ‘understand’ past bookings text. Theoretically, it ‘understands’ every word of a booking text sentence.

For every token in the input sentence, CartoBERT will output a 768-dimension vector (we use the default hidden layer size of the ALBERT base model in CartoBERT, however this is configurable) from last transformer encoder layer, and we use that to represent CartoBERT’s ‘understanding’ of the token’s meaning in the sentence context for fine-tune step.

As illustrated in the diagram below, while fine-tuning CartoBERT for pickup point name recognition, we replace the Masked Language Model and Sequence Order Prediction layers from CartoBERT in pre-train step with token classification layer. The token classification layer learns to predict the probability of a token belonging to a pickup point name, with the final token representation output from CartoBERT transformer layers, from labelled training data created with bookings text sentences, and corresponding pickup point names. Here, we use weighted cross entropy loss to deal with class imbalance, as tokens tagged to pickup point names are a minority.

With this, CartoBERT is fine-tuned to extract pickup point names from bookings text sentences, if any.

How does the model perform?

CartoBERT gives a lift of more than 25% in pickup point name accuracy to ~93% accuracy, which is measured as the percentage of valid pick up point names out of generated names. With this high accuracy, we have achieved full scalability of automatic generation for named pickup points to quickly cover multiple geographies without heavy reliance on human inputs.

What’s next?

We are not stopping here and are exploring using active learning to further improve CartoBERT. With active learning, we only flag out uncertain predictions, which are measured as sentence level least token probability[3] for human labelling. We then use human-curated data as feedback for model learning. In this way, we can improve model learning efficiency with minimum labelling effort.

What’s more, with the success of CartoBERT, we are considering pre-training and open sourcing a general Indonesia Bahasa ALBERT model with Indonesia open corpus from wiki, news, Twitter etc. Currently, the options for open-sourced language model in Indonesia Bahasa are very limited, only pre-trained static word embeddings such as word2vec, fasttext etc are available. It would be beneficial to the community if we have a good state-of-the-art attention-based transformer model for the language. Stay tuned for more updates from the Cartography Data Science team. ?

Leave a ? if you liked what you read. Ping me with suggestions and feedback.

Thanks to all the amazing people who contributed to this post: Tan Funan, Zane Lim, Dang Le, Lijuan Chia, Bani Widyatmiko, Maureen Koha, Ringga Saputra, Nur Izzahudinr, Sandya Ardi, Yeni Primasari, Ardya Dipta.


[1] J. Devlin, M. Chang, K. Lee, K. Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
arXiv:1810.04805 (2018)

[2] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv:1909.11942 (2019)

[3] M.Liu, Z. Tu, Z. Wang, X. Xu: LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition. arXiv:2001.02524 (2020)

Liked what you read? Sign up for our newsletter to have our latest stories delivered straight to your inbox!