GitHub execs say Copilot aims to make developers 10x more productive. Being the data-driven folks that we are, we put it to the test.
Lately, there’s been a lot of chatter about AI in our developer circles. Every peer I speak to tells me they’re excited about integrating AI-powered coding assistants into their workflows, as they see the massive potential and enthusiasm of early adopters. But for a more long-term adoption strategy, they’d like to figure out if using AI dev tools like GitHub Copilot is worth it.
GitHub execs say they aim to make developers 10x more productive. So, being the data-driven folks that we are, back in the summer of 2023, we decided to put it to the test.
Since then, we’ve accompanied many companies through their evaluation of copilots from initial pilots to large-scale deployments. We’ve helped them select the right AI pair programming tool for their organization; increase adoption to maximize developer productivity; and monitor the impacts on value (velocity) and safety (quality and security).
GitHub Copilot is an AI-powered coding assistant that's been making waves since its official launch back in October 2021 . With a reported 50,000+ companies adopting the technology so far, the big questions still on everyone's minds are: Does it live up to the hype? Should it become the default for every single developer?
Well, instead of relying on hearsay, we ran a good old-fashioned experiment at our company. Here's what we found.
To keep things fair and square, we split our team into two random cohorts — one armed with GitHub Copilot (around a third of our developers) and the other without. We made sure the cohorts were not biased in any way (e.g., that one wasn’t stacked exclusively with our most productive developers).
Over three months, we closely monitored various performance metrics, focusing on speed, throughput, and quality. Our goal? A clear, unbiased view of GitHub Copilot's impact.
Why these metrics? They're tangible and measurable, and they directly impact our deliverables. They also give us a holistic picture. We don’t want to gain speed if there’s a huge price to pay in quality. Finally, it would give us a good indication of areas we might need to strengthen in our practices or process if we want to fully go down the GitHub Copilot route.
The data was pretty revealing. The group using GitHub Copilot consistently outperformed the other cohort in terms of speed and throughput over the evaluation period (May-September 2023).
Let’s start with throughput.
Over the pilot period, the GitHub Copilot cohort gradually began to outpace the other cohort in terms of the sheer number of PRs.
Next up, I looked at speed.
I examined the Median Merge Time to see how quickly code was being merged into the codebase. The GitHub Copilot cohort’s code was consistently merged approximately 50% faster. The Copilot cohort improved relative to its previous performance and relative to the other cohort.
The most important speed metric, though, is Lead Time to production. I wanted to make sure that the acceleration in development wasn’t being negated by longer time spent in subsequent stages like Code Review or QA.
It was great to see that Lead Time decreased by 55% for the PRs generated by the GitHub Copilot cohort (similar to GitHub’s own research), with most of the time savings generated in the development (“Time in Dev”) and code review (“First Review Time”) stages.
The last dimension we analyzed was code quality and code security, where I looked at three metrics: Code Coverage, Code Smells, and Change Failure Rate.
But why did Copilot make such a noticeable difference? The engineers in our Copilot cohort said the boost is largely due to no longer starting from a blank page. It’s easier to edit an AI-driven suggestion than starting from scratch. You become an editor instead of a journalist. In addition, Copilot is great at writing unit tests quickly.
But, not all AI coding assistants are created equally, and the time savings can vary greatly depending on the tool used. For example, one of our clients conducted a bakeoff between two of the leading AI coding tools on the market, and one of the tools saved three hours more per developer per week compared to the other.
Now, the juicy bit: Is the performance boost worth the cost? For us, the answer's leaning towards a solid "yes." A 55% improvement in lead time with no collateral damage to code quality is a phenomenal ROI. But, of course, every team's dynamics are different. If you're weighing the costs, consider not just the subscription fee but the potential long-term benefits in productivity and effects on code quality
As I mentioned, lots of my peers want to create a similar analysis at their org. Today it’s GitHub Copilot, tomorrow it’ll be something else.
What made generating this comparison easy for me was three-fold:
Today, Faros AI provides a complete value framework for AI evaluation and adoption, from the initial rollout to larger-scale deployments and long-term value optimization. This is much more sophisticated and comprehensive from the prototype I used last year.
Watch this five-minute tour of these dashboards:
So, back to our main question: Is GitHub Copilot worth the investment? Our data shouts a resounding "yes." But hey, tools are only as good as how we use them. It might be the perfect fit for some, while others might find alternative methods more suited to their workflow. Plus, if you have bottlenecks in your review, build, and test cycles, your efficiency gains may be reduced.
The next big question organizations are going to face is where to direct the developer productivity they’ve just unleashed. If you’re going to embrace GitHub Copilot, you need to have a plan. There’s no shortage of roadmap initiatives and technical debt for folks to sink their teeth into, but teams should be setting those priorities with intentionality.
The next big question organizations are going to face is where to direct the developer productivity they’ve just unleashed. There’s no shortage of roadmap initiatives and technical debt for folks to sink their teeth into, but teams should be setting those priorities with intentionality.
If you’re going to embrace GitHub Copilot, you need to have a plan. Our AI Copilot Evaluation solution provides comprehensive visibility into the impact of GitHub Copilot — from pilot to rollout to optimization — so engineering leaders can communicate value and ROI with confidence. Request a demo to get started.
Global enterprises trust Faros AI to accelerate their engineering operations. Give us 30 minutes of your time and see it for yourself.