Why we built ‘Proctor’ — An Automation Orchestrator
How we went about solving infrastructure automation using our in-house product, Proctor. In the process, we increased the use of automation for multiple teams inside GO-JEK.
By Akshat Shah
Amazon had an insane ask from its engineers a while back. Every product a customer ‘adds to cart’ should never be missed because of technical issues. Lo and Behold, DynamoDB was born.
Insane requirements drive the best innovation
Proctor, our developer-friendly automation orchestrator, was born out of similar insane requirements:
- Centralize automation across the org
- Democratize contributions to automation
- Ease utilization of automation
Demo
Let’s use Proctor to run automation and see how it works.
We want to point proctor.gojek.com domain to 2.0.1.8 IP address. Using the proctor binary, we can run automation -aka Procs to create domain name records.
Behind the scenes, the dns-creation
Proc creates a domain name record in AWS route53.
To create a domain name record, we needed neither access to AWS, nor knowledge of route53
Getting into a soup
GO-JEK is a hyper growth startup. We have all the challenges that comes along with a hyper growth startup struggling to keep pace with rapid demand. Matt Klein’s blog The Human Scalability of DevOps summarizes this journey succinctly. Similar to what Matt underlines, we too formed a central infrastructure team in the early days to cope with maddening growth. The intent was clear: abstract as much infrastructure as possible so the application/product developers can focus on business logic. The result, though good in the short-term, proved catastrophic in the long-term. From a perspective of infrastructure automation, we ran into multiple problems:
- No conventions around building and using automation
- Managing dependencies of automation. For example, some automation use AWS, while some use gcloud and so on
- Lack of conventions and dependency management made contribution to automation meager
- Managing access to powerful tools needed for running automation
- Widespread access to powerful tools like knife, AWS, gcloud resulted in some accidents
- Lack of dependency and access management made onboarding developers to use infrastructure automation an issue
- Lack of centralized automation led to duplication of efforts in the org
90% of automation was built and used by the central infrastructure team before Proctor came into the picture
Solving problems
As our CTO Niranjan Paranjape points out in his talk on scaling organizations, automation is a more efficient way to scale. Proctor, our solution for automation, solves the highlighted problems by:
Centralizing automation across the org
- Entire automation resides in a central repository and is under version control
Democratizing contribution to automation
- Automation is packaged inside docker images
- Dockerized automation means no restraints for developers contributing to automation. AWS, gcloud, chef, ansible choose whichever tools you like! ruby, python, bash, perl choose whichever language you like!
- proctord provisions access to powerful tools needed by automation during runtime
For building automation, your imagination is the limit!
Easing the utilization of automation
- Users only need a binary to run automation from their CLI
- The binary communicates with proctord, a web service responsible for running automation
- proctord spins up automation jobs in a Kubernetes cluster
- proctord streams automation logs to user’s CLI for feedback
Summarizing Proctor
For using automation, users only need a binary. Dependencies of automation, access of tools is no longer their headache. This helped us onboard developers easily and nullify access management.
For contributing to automation, developers have to add automation to the central repository. The automation is packaged inside a docker image along with its dependencies. This helped us manage dependencies of automation, increase contribution, and reduce duplicate efforts.
Impact
We launched a beta of Proctor at GO-JEK on 1 Mar 2018, since then:
Increase in usage of automation across the org has resulted in a reduction in ad-hoc tasks for the central infrastructure team by 70% over the past 3 months.
Multiple teams now contribute to automation making this scalable.
We now build the utilities of powerful tools as automation and distribute it using Proctor, enabling controlled access and reducing accidents.
Conclusion
It took 3 components —a binary, a web service and a centralized automation repository to solve automation at GO-JEK 😍
Open sourcing Proctor
We’ve open sourced Proctor. You can find the source code here and below:
In subsequent blogs we’ll take a deep dive into the technical architecture of Proctor. We’ll answer questions such as why rundeck didn’t work for us, how we scale Proctor, how we manage secrets and access to tools required by automation, etc...
I hope this blog was helpful 😀. Please leave your thoughts in the comments section below. Better yet, talk to us, join us and help us solve the millions of problems that come with scaling a Super App of 18 products. Check out gojek.jobs for more. Grab this chance! We’re expanding and could do with more talent.