Meet Dollhouse — Overwatch for the Cloud
We built a tool to help us monitor GOJEK’s substantial GCP infrastructure. Then we open-sourced it.
By Fahri Shihab
If you’ve been following GOJEK in the media, you’d have noticed a term we bring up often when talking about growth and scale.
On the journey of building 19+ products ranging from foodtech to fintech, our total completed order volume has grown 6600x. This sort of hockey stick growth requires a lot of infrastructure to support it. In our case, this is mostly backed by Google Cloud Platform (GCP).
This post introduces Dollhouse, GOJEK’s in-house infrastructure audit and monitoring tool for GCP.
The status quo
To fully understand and secure our infrastructure, information gathering is vital. This allows us to look for state changes which affect our infra — making it potentially insecure.
Sounds easy, doesn’t it?
One of the biggest challenges we have in securing our infrastructure is getting to know when something has changed. For example, firewall changes, IAM user changes, bucket permission changes, the list goes on. It’s difficult to know what state it was before, and what exactly changed.
To do the same across all the projects, we depend on GCP’s logging service — Stackdriver. It is an amazing logging service which captures important events from admin activity logs to data access logs, and much more.
The Infosec team wanted to deploy an audit and monitoring solution at scale to keep track of what GOJEK deals with on a daily basis. However, it was also important that we not intrude too much and require only minimal access to the GCP environment. These criteria ruled out many great tools currently available on the market.
So we decided to take matters into our own hands, and build a solution ourselves.
Our own creation — we call it Dollhouse.
What does it do?
As Google Cloud Platform forms the core of our arsenal, we decided to only focus on Google Cloud. Therefore, Dollhouse is divided into two main components:
- dollhouse-audit : Audit firewall, IAM, and service account changes on an on-demand/continuous basis.
- dollhouse-bot : Monitoring any change (Insert, Modify, Delete) on any firewall, IAM, and Service Accounts, giving us near real-time alerts that can be leveraged for a basic Incident Response if needed.
Dollhouse-audit is a simple command-line interface tool that can be run with an existing Google account configured on your terminal. It only needs viewer access on the project you want to audit. Here is a sample command and the output it gives.
You can view the output in a tabular format on your console or you can use Kibana for easier visualisation, if you fancy a dashboard.
The screenshot below depicts the number of roles present on a particular project. Here, you can see there are 13 owners to a project (which could be a red flag for your organisation).
Dollhouse-bot is a simple Slackbot which is integrated with Stackdriver monitoring. It alerts firewall rules and IAM role assignments which are problematic or need further review. It also supports bot commands to give more insights about a rule or account if needed.
To get more details of a specific firewall rule, you can talk to the dollhouse-bot with a specific command set. It will give more details about the firewall rule or service account in question.
That’s it! If you’d like to try out Dollhouse, you can refer to the repository here:
Big thanks to my colleague Sanjog Panda for his contribution to this blog and the Dollhouse repository 👏
Scale is tough, and scale is our constant companion. Scale is to GOJEK what darkness is to Bane 😅. We needed to learn to keep up, and in the process became resourceful enough to build a #SuperApp with just 250+ engineers. Above all, scale taught us to hire smart people. Are you good enough to take on GOJEK’s scale? Visit gojek.jobs, let’s find out. ⚔️