Is GitHub Copilot Worth It? Real-World Data Reveals the Answer
GitHub execs say Copilot aims to make developers 10x more productive. Being the data-driven folks that we are, we put it to the test.
Thomas Gerber
Browse chapters
Share
May 17, 2024
Lately, there’s been a lot of chatter about AI in our developer circles. Every peer I speak to tells me they’re excited about integrating AI-powered coding assistants into their workflows, as they see the massive potential and enthusiasm of early adopters. But for a more long-term adoption strategy, they’d like to figure out if using AI dev tools like GitHub Copilot is worth it.
GitHub execs say they aim to make developers 10x more productive. So, being the data-driven folks that we are, back in the summer of 2023, we decided to put it to the test.
Since then, we’ve accompanied many companies through their evaluation of copilots from initial pilots to large-scale deployments. We’ve helped them select the right AI pair programming tool for their organization; increase adoption to maximize developer productivity; and monitor the impacts on value (velocity) and safety (quality and security).
Introduction
GitHub Copilot is an AI-powered coding assistant that's been making waves since its official launch back in October 2021 . With a reported 50,000+ companies adopting the technology so far, the big questions still on everyone's minds are: Does it live up to the hype? Should it become the default for every single developer?
Well, instead of relying on hearsay, we ran a good old-fashioned experiment at our company. Here's what we found.
Background
To keep things fair and square, we split our team into two random cohorts — one armed with GitHub Copilot (around a third of our developers) and the other without. We made sure the cohorts were not biased in any way (e.g., that one wasn’t stacked exclusively with our most productive developers).
Over three months, we closely monitored various performance metrics, focusing on speed, throughput, and quality. Our goal? A clear, unbiased view of GitHub Copilot's impact.
Why these metrics? They're tangible and measurable, and they directly impact our deliverables. They also give us a holistic picture. We don’t want to gain speed if there’s a huge price to pay in quality. Finally, it would give us a good indication of areas we might need to strengthen in our practices or process if we want to fully go down the GitHub Copilot route.
Results
The data was pretty revealing. The group using GitHub Copilot consistently outperformed the other cohort in terms of speed and throughput over the evaluation period (May-September 2023).
Let’s start with throughput.
Over the pilot period, the GitHub Copilot cohort gradually began to outpace the other cohort in terms of the sheer number of PRs.
Pull Request Merge Rate cohort comparison, with and without GitHub Copilot
Next up, I looked at speed.
I examined the Median Merge Time to see how quickly code was being merged into the codebase. The GitHub Copilot cohort’s code was consistently merged approximately 50% faster. The Copilot cohort improved relative to its previous performance and relative to the other cohort.
Median Merge Time cohort comparison, with and without GitHub Copilot
The most important speed metric, though, is Lead Time to production. I wanted to make sure that the acceleration in development wasn’t being negated by longer time spent in subsequent stages like Code Review or QA.
It was great to see that Lead Time decreased by 55% for the PRs generated by the GitHub Copilot cohort (similar to GitHub’s own research), with most of the time savings generated in the development (“Time in Dev”) and code review (“First Review Time”) stages.
Lead Time comparison with cycle time breakdown, with and without GitHub Copilot
The last dimension we analyzed was code quality and code security, where I looked at three metrics: Code Coverage, Code Smells, and Change Failure Rate.
- Code Coverage improved, which didn’t surprise me. Copilot is very good at writing tests.
- Code Smells increased slightly but were still beneath an acceptable threshold.
- Change Failure Rate — the most important metric to me together with Lead Time — held steady.
Code Coverage comparison, with and without GitHub Copilot
Analysis
But why did Copilot make such a noticeable difference? The engineers in our Copilot cohort said the boost is largely due to no longer starting from a blank page. It’s easier to edit an AI-driven suggestion than starting from scratch. You become an editor instead of a journalist. In addition, Copilot is great at writing unit tests quickly.
But, not all AI coding assistants are created equally, and the time savings can vary greatly depending on the tool used. For example, one of our clients conducted a bakeoff between two of the leading AI coding tools on the market, and one of the tools saved three hours more per developer per week compared to the other.
Cost-Benefit Analysis
Now, the juicy bit: Is the performance boost worth the cost? For us, the answer's leaning towards a solid "yes." A 55% improvement in lead time with no collateral damage to code quality is a phenomenal ROI. But, of course, every team's dynamics are different. If you're weighing the costs, consider not just the subscription fee but the potential long-term benefits in productivity and effects on code quality.
Don't have budget for Copilot? Read our guide to getting approval for AI tools outside normal budgeting cycles.
Tips for Conducting Your Own Assessment
As I mentioned, lots of my peers want to create a similar analysis at their org. Today it’s GitHub Copilot, tomorrow it’ll be something else.
What made generating this comparison easy for me was three-fold:
- I’m already tracking developer productivity metrics in Faros AI, based on the data it knits together from Jira, GitHub, Buildkite, Statuspage, and PagerDuty.
- Unlike cookie-cutter metrics tools, Faros AI has a complete, flexible BI layer that made it easy for me to define my two cohorts and create a custom dashboard for this specific analysis. It took me just a few minutes to generate my GitHub Copilot analysis dashboard.
- I could easily generate a holistic view of adoption, usage, velocity, and quality metrics based on the combination of system telemetry and developer surveys. This data helped me benchmark short-term impacts and identify emerging bottlenecks.
Today, Faros AI provides a complete value framework for AI evaluation and adoption, from the initial rollout to larger-scale deployments and long-term value optimization. This is much more sophisticated and comprehensive from the prototype I used last year.
Watch this five-minute tour of these dashboards:
Conclusion
So, back to our main question: Is GitHub Copilot worth the investment? Our data shouts a resounding "yes." But hey, tools are only as good as how we use them. It might be the perfect fit for some, while others might find alternative methods more suited to their workflow. Plus, if you have bottlenecks in your review, build, and test cycles, your efficiency gains may be reduced.
The next big question organizations are going to face is where to direct the developer productivity they’ve just unleashed. If you’re going to embrace GitHub Copilot, you need to have a plan. There’s no shortage of roadmap initiatives and technical debt for folks to sink their teeth into, but teams should be setting those priorities with intentionality.
The next big question organizations are going to face is where to direct the developer productivity they’ve just unleashed. There’s no shortage of roadmap initiatives and technical debt for folks to sink their teeth into, but teams should be setting those priorities with intentionality.
If you’re going to embrace GitHub Copilot, you need to have a plan. Our AI Copilot Evaluation solution provides comprehensive visibility into the impact of GitHub Copilot — from pilot to rollout to optimization — so engineering leaders can communicate value and ROI with confidence. Request a demo to get started.
More articles for you
See how real-world user insights drove the latest evolution of Faros AI’s Chat-Based Query Helper—now delivering responses 5x more accurate and impactful than leading models.
Editor's pick
Is the Build Time metric the right measure to demonstrate the ROI of Developer Productivity investments? Does it stand up in court? We found out.
Editor's pick
A guide to measuring continuous integration metrics, such as CI Speed and CI Reliability, and an introduction to the most important developer productivity metric you never knew existed.
See what Faros AI can do for you!
Global enterprises trust Faros AI to accelerate their engineering operations.
Give us 30 minutes of your time and see it for yourself.