Mono Repo Vs Multi Repo: Tips To Re-evaluate Codebase Structure
By Mohammad Asif
In 2017, the Android and iOS codebases for the Gojek app were structured such that each of the 18+ products had a separate git repository. It was a multi repo setup, and needed a codebase structure change. In 2018, for the Android codebase, we merged the code for all products into a single git repository, aka the mono repo setup.
Ever since, there have been discussions amongst our iOS engineers to move to a mono repo setup for the iOS codebase as well.
So, we decided to take a data-driven evaluation approach to help us decide if moving to a mono repo would be worth it for the iOS codebase.
If you’re considering moving from a multi repo setup to a mono repo and feeling stuck, this blog post shall guide you on how to back your decision, based on data points.
To start with, we reached out to senior developers in Gojek who were a part of the codebase migrations across platforms to understand the advantages and the pain points of a mono repo. The interviews helped us identify a few key evaluation parameters which we must consider when evaluating a possible migration.
Key evaluation parameters
- Product vs Platform team workflow differences
- Code ownership
- Projected CI utilisation
- Issues with the existing multi repo workflow
- Reproducible builds
- Changes to release processes
Product vs Platform teams
At Gojek, we have been able to categorise teams of developers into two wide buckets.
What do devs in a Platform team do?
- Work on SDKs that form the base of the application. These SDK’s would be used by different teams to leverage different functionalities. For example: Chat SDK is used by different products like Driver App, Consumer App to enable Chat in their products/apps.
Payment SDK would be used by all the products which need payment processing for their products - Frequently collaborate with product team developers to integrate these SDKs into products and take them live
- Make changes in all modules upon contract changes. For example: If the Chat SDK team makes an API signature change, it would be the responsibility of the Chat SDK developers to modify the signature in all modules that consume Chat SDK and raise Merge Requests to the respective repos
- Be aware of various workflows, rules, and checks on CI, which can vary from repository to repository
What do devs in Product teams do?
- Work on product-specific features by combining the platform SDKs
- Have most of their work limited to changes in their respective product repositories
- Any CI checks to trigger are specific to their repository
Example: After every change, run unit tests and ensure there are no new lint errors introduced
As you can imagine, the devs in Platform teams have to raise pull requests to multiple repositories due to the high cross-team collaborative work involved. While developers in the Product teams can work on their product in a particular repository for up to weeks at a time and never have to touch another repository.
By moving to a mono repo:
Therefore, we can conclude that by moving to a mono repo, the devs in Platform teams would have an improved experience and the devs in Product teams would have a degraded experience compared to the existing multi repo setup.
In Gojek, we have a split of 47% of devs working in Product teams and 53% working in Platform teams. This number doesn’t help us make a good call, because we’d be improving the experience for nearly half the devs, while deteriorating the experience for the other half.
Code ownership
Here, we evaluated the impact developers would have related to code ownership:
Projected CI utilisation
With the multi repo setup on iOS, each project has it’s own customisable CI setup, including:
- Custom Danger rules
- Custom unit test command invocations
- Custom Swiftlint rules
For moving to a mono repo, we needed data points around how many jobs would be executed in this new setup.
To get this data point,
- We extracted data from the CI jobs executed in the Android mono repo for the past 3 months, since post migration, we’d have a similar codebase and CI setup
- We attempted to project CI utilization for our self-hosted mac mini runners, by looking at how many CI jobs are executed on the android mono repo
- We had been publishing multiple data points such as number of jobs, job duration, branch of the triggered job and more to our in-house ELK setup
This helped us understand the peaks, lows, and average jobs executed in the android mono repo each day.
We could then project these data points to the hypothetical iOS mono repo, since we have roughly the same number of engineers and teams working on the android and iOS apps.
With this data, we were able to identify the following:
- Around 200 pipelines would be run each day on the develop branch of the android mono repo
- Our cluster of mac mini runners are able to execute 400 jobs in a day, therefore, we would be able to meet the job demands if we were to move to a mono repo
- The above is true only if we are able to keep individual job times below 30 mins for iOS builds and test runs. Once we cross 30 mins to execute jobs, we start seeing runner availability bottlenecks. Because unlike the Android CI setup, which runs on Google Cloud Linux instances, we cannot scale our self-hosted cluster of mac minis on the fly to handle the demand. We have to forecast usage, procure new machines, run the first time setup, and then we can meet increased demand
Finding alternatives to improve the existing multi repo workflow
In this section, we talk about issues that were raised by multiple developers due to the multi repo workflow, and we’ll also list down a few quick-win alternatives we found to reduce the impact of these issues in day-to-day development.
Fetching the stable branch for each release across repositories:
With every release of the app, we create a git tag to tag each repository to a commit which was used to create a combined iOS app and ship to the appstore.
For example: If a developer wants to check out the previous release code for a repository they’d have to go to each submodule and checkout the specific tag manually.
Similarly, getting the latest changes across all repos, without having to update the submodule hashes in the parent repo, everytime a change in a submodule was merged was also a challenge.
💡Solution: We needed a mechanism which abstracted out the git workflow and allowed devs to specify what branches/commits/tags they want checked out across repositories. Therefore to solve this, we built a ruby script that helps developers checkout a particular tag or branch across all submodules. The script parses a yml file stored in the parent repo and checks out particular branches/tags across repositories according to the data in the yml file. Internally the script uses the default git commands to fetch changes in each submodule and checkout the branch/tag if it exists.
This script has also allowed us to skip asking developers to update the submodule hashes in the parent repo whenever a change is merged into a submodule, which is a common workflow when working with git submodules.
Today, developers use this script daily to fetch the latest changes across all submodules and they do not have to manually do a git checkout across submodules.
Git workflow challenges with multiple repositories:
Devs working on changes across multiple repositories, would often find it difficult to execute “git status” across repositories, to identify the changes made during feature development. Similarly, to create branches or commit changes across repositories, devs have to run the same git command over and over again for each repository that had changes.
💡Suggested workaround: We suggested developers try out tools such as the Meta tool. This allows them to run common git commands such as git status
or git push
across all submodules once a meta.json
file defines the list of submodules and the remote for each submodule.
Reproducible builds
You might have come across a common concern with multi repo setup: It’s difficult to get a reproducible buildwhenever we want to. This is because the code lives in multiple repositories, meaning, if a developer triggers a build, it will be unique only for that timeframe. By the time a developer triggers the next build, there could have been pull requests merged into dependent repositories, which would mean the next build is not the same as the build triggered, say, an hour ago.
Before we deep dive into the solution for this problem, let me quickly share how our repository setup looks like and how developers trigger builds.
We have a parent repository called “ios-ca-gojek” which has all the product-specific code added as a git submodule to this repository. Therefore, Transport, GoFood, and GoPay each have their own git repository, which is linked to the parent repository “ios-ca-gojek” using git submodules.
To create a build, developers have to trigger a GitLab pipeline on ios-ca-gojek repository. The jobs in this repository checks out the source code for the parent repo and for all the submodules and then uses `Fastlane/xcodebuild` to create IPA files.
We’ve solved the problem of reproducible builds, by having a job before the build job called “prepare” job. This job checks out our source code across all repos, zips it up, and forwards it to multiple build jobs for debug and release builds. A mapping of each submodule to commit hashes is also created at the same time. This allows Devs to debug issues reported by QAs, by checking out the commit SHAs of a particular build on their local machines.
Preparing the job in our CI pipeline
Further after every release, we tag the commit hashes of each submodule and the parent repo with the same tag. This allows Devs to go back to a much earlier version anytime as needed.
The “prepare” job helps alleviate one of the biggest challenges of multiple repository setup i.e. reproducible builds.
Changes to release processes
We follow a code freeze (aka release train) process in Gojek. Every two weeks we fork a branch for the current release from develop. This process is the same in Android and iOS apps.
With the multi repo setup in iOS, instead of forking from develop on a single repository, we have to do this across 44 submodules at the same time.
Hence to solve this we’ve created ruby scripts that are manually triggered to create these forked branches on every release.
The release process would be improved significantly if we move to a mono repo setup, since there’s only a single branch to fork, and it’ll also be easier for developers to go back to older versions with a single switch.
Surveying developers
To identify how important each of the above points was to individual developers, we ran a quick survey, which was open for 3 days, with MCQ-type questions. For example, we asked developers:
- “How frequently do you have to checkout code from an older release?”
- “How frequently do you have to checkout code across multiple repositories to be able to debug a particular bug?”
…and so on.
The survey showed that the majority of devs don’t really care about reproducible builds or the release process as such. They rarely have to go back to code older than a month.
The survey helped us give higher or lower weightage to the various topics we’ve discussed in this post.
Taking the final call
After evaluating the above impact areas, we then assigned a weightage based score on how important an impact area was and used it to derive a number which depicted whether moving to a mono repo was worthwhile for our teams at Gojek. For Gojek, the numbers came out in favour of staying in the multi repo setup.
Doing this exercise helped us take a data-driven decision around what code structure we should follow for the iOS codebase, specific to the requirements and processes at Gojek.
If you’re in a similar situation, you can try following a similar process in your org too.
Click here to check out more stories from the vault.
Also, we’re hiring! Click below to view what suits you best: