How much code is AI-generated?
AI generates 25% of Google’s new code. Other organizations seek similar insights to mitigate the risks of this new age of AI-driven development.
Ron Meldiner
November 22, 2024
How much code is AI-generated?
AI-powered coding tools are transforming the software development landscape, making them more essential than ever. Google, a leader in AI adoption (and creator of Gemini Code Assist), has set a benchmark: AI systems now generate over 25% of new code for Google’s products. This revelation, shared by CEO Sundar Pichai, underscores the strategic value of tracking AI’s impact on productivity, quality, and efficiency—insights that drive Google’s AI investments and decision-making.
But not every organization is Google. Most companies lack the internal infrastructure to capture such detailed metrics. As a result, they struggle to quantify how much code AI tools generate and how it may influence their codebase, both now and in the future.
Fortunately, incorporating data directly from the development environment can fill this gap, allowing a broader range of companies to track AI-generated contributions effectively.
Why understanding human vs. AI contribution matters
Understanding the difference between human and AI-generated code isn’t just about curiosity; it's crucial to navigating the modern software development landscape.
Inevitably, AI adoption will only increase, bringing many blessings but potentially some curses. Without proper tracking and understanding of AI’s role in the development process, companies could find themselves dealing with the fallout of new technical debt or vulnerabilities, both accumulated silently over time.
By maintaining visibility into the use and impact of AI-generated code, engineering teams can proactively manage and respond to changes in behavior, ensuring that their codebases remain robust and predictable.
There are several reasons why telling when code is AI-generated is important.
Key reasons for understanding human vs. AI code contribution
Long-term codebase viability
- Maintainability: The longevity and health of a codebase are deeply influenced by the origin of its content. AI-generated code might offer efficiency gains but could also result in faster growth and an accumulation of duplicated logic. Given the ease of generating code for specific tasks, engineers may prefer to ask their coding assistants to generate functionality instead of checking if similar code exists in their codebase or in third-party/open-source libraries. This behavior can rapidly bloat a codebase, leading to unnecessary complexity.
- Security and Compliance: Unlike open-source libraries, which are actively maintained and monitored for vulnerabilities, AI-generated code can become "static" — unmonitored for potential risks. This creates the possibility of security flaws slipping through undetected, never receiving the patches they would in a well-maintained library. Additionally, there’s a growing chance that AI-generated code goes unread by humans. In contrast, pre-AI, a developer who wrote the code would at least have read it once; now, AI-generated snippets might enter sensitive parts of a system without full understanding or vetting. This amplifies the need for vigilant monitoring to mitigate risks.
Code quality
- Readability and organization: The convenience of generating large sections of code through AI can sometimes lead to less readable or logically structured code. Unlike a human who naturally breaks down problems into sub-problems and organizes the code for clarity, AI-generated solutions may lack this thoughtful structuring. Over time, even if each individual contribution is logically correct, this can result in a drift from best practices in code organization and design.
- Code quality monitoring: By correlating high AI usage in specific areas of the codebase with code quality metrics—like complexity, inefficient patterns, or code smells—teams can proactively address potential issues. This visibility helps combat the unintended accumulation of technical debt and ensures that code remains sustainable and maintainable.
Strategic workforce implications
- Mentorship and training: AI is reshaping the development landscape, impacting how junior developers learn and grow. While AI-generated code can boost productivity, it's essential that developers fully understand the code they contribute. Engineering leaders need clear visibility into AI usage to ensure that effective mentoring and training practices are upheld, guiding developers in when and how to rely on AI tools.
- Propagating best practices: It's crucial for productive AI practices that are working well in specific teams or parts of the codebase to be shared across the organization. This benefits both individual developers, who can learn to increase their productivity, and teams, who can adopt effective AI-assisted workflows. Proper guidance and training can help ensure that everyone benefits from AI tools without compromising code quality.
Personal professional evolution
As AI tools continue to play a bigger role in development, developers need to monitor their reliance on these tools to ensure they're not losing essential coding skills.
Having visibility into their own AI usage—compared to peers—allows individuals to gauge their progress and adjust as needed. This insight helps them stay effective at reading, understanding, and troubleshooting AI-generated code, maintaining their capability as skilled engineers even in an AI-augmented environment.
Balancing AI efficiency with core coding skills is crucial for both personal growth and professional effectiveness.
A panel within the IDE shows developer’s the impact of AI on their daily work
Why is it hard to tell when code is AI-generated?
The challenge of identifying AI-generated code lies in the complexity of modern coding practices. Developers are no longer limited to manually typing every line of code; instead, they draw on a variety of tools and resources:
- IntelliSense and autocomplete: Features in IDEs accelerate coding by suggesting completions for partially typed code.
- Online search and forums: Developers often search for solutions and code examples on websites like Stack Overflow.
- Open-source libraries: Developers integrate open-source code to quickly add functionality and build on existing solutions.
- Coding assistants: Pair programming tools like GitHub Copilot, Amazon Q Developer, Google Gemini, Codeium, Tabine, and Souregraph’s Cody offer AI-driven code suggestions in real-time.
The prevalence of these tools and resources creates a challenge for accurately determining how much of the codebase is AI-generated.
Coding assistant vendors can only provide statistics about their specific service, showing how often developers accept suggestions or utilize AI-generated snippets. But they lack visibility into what developers do outside of their platforms—whether they use other coding aids, search online for examples, or incorporate open-source code.
Instrumentation of the developer's environment is essential to accurately determining the ratio of AI-generated code to human-written code.
By capturing data directly from the development process, it's possible to get a holistic view of all code contributions, whether they come from coding assistants, traditional autocomplete tools, manual typing, or external sources. This holistic approach provides the visibility needed to understand AI’s true impact on the software development workflow.
AI coding assistant APIs don’t answer these questions
Only a few modern coding assistants offer APIs that provide a glimpse into their usage—and when they do, it’s typically in aggregate across the entire engineering organization or sub-group.
Coding assistants provide:
- Acceptance rates: The percentage of AI-generated suggestions accepted by developers.
- Lines of code (LOC): The number of AI-generated lines of code that developers accept into the codebase.
- Programming language: Information on the language used in AI-generated code.
While these statistics are useful, they leave significant gaps in understanding how AI is transforming software development:
- What percentage of new code is AI-generated? Acceptance rates alone don't provide a full picture. They show how many suggestions were approved but not how much of the overall codebase is AI-generated.
- What types of code is AI creating? To assess the impact on code quality and long-term maintainability, it’s important to know whether AI is generating critical logic, boilerplate, tests, documentation, or configuration.
- Where in the codebase is AI making contributions? Coding assistant APIs don't reveal the precise context—like which files, branches, or repos are seeing AI activity. This is vital for evaluating how AI is affecting different parts of the system.
- Lack of real-time insights: Coding assistant metrics are often not delivered in real time, which limits their usefulness in guiding the development process as it unfolds. Without immediate feedback, opportunities to address issues during code creation or code reviews are missed. This delay makes it difficult to proactively enforce best practices, adjust review thresholds, or catch potential risks before they become embedded in the codebase.
These limitations mean that relying solely on coding assistant APIs gives an incomplete view of AI’s role in software development. They focus on aggregated metrics without shedding light on the detailed nuances of AI’s contributions. For example, while acceptance rates can indicate that developers find certain AI suggestions useful, they don't distinguish between trivial suggestions like formatting or documentation and critical code logic.
IDE data completes the AI picture
To fully understand AI's impact on software development, collecting data directly from the developer's environment is key.
Gathering data in the IDE with a VScode extension can fill the gaps and offer a more comprehensive view of how AI is being integrated into coding workflows. Here's how tracking AI usage in the IDE can overcome the limitations of coding assistant APIs:
Real-time tracking: Capturing AI’s role as code is written
Data collected directly in the IDE allows organizations to capture how code is being written as it happens. Unlike metrics from coding assistant vendors, which are often delayed and retrospective, IDE-based data reflects real-time AI usage. This allows for immediate insights into which parts of the code are being generated by AI tools, when AI is used, and to what extent.
Enhanced visibility for developers
By tracking AI usage directly in the IDE, developers can gain real-time feedback about their coding practices. They can see how often they rely on AI-generated code, what types of code are AI-assisted (e.g., logic, documentation, or tests), and where AI tools contribute to their work. This helps developers understand how AI is influencing their coding habits and allows them to adjust their workflows accordingly.
Context for code reviews
As code changes are made and pull requests (PRs) are submitted, IDE-based data can annotate the PR with metadata about AI involvement. This allows reviewers to understand the proportion of the code that was generated by AI, offering valuable context for the review process. For example, if a pull request contains a significant amount of AI-generated content, reviewers may want to pay closer attention to ensure the quality and security of the code. This context helps engineering leaders make more informed decisions about when to apply additional scrutiny.
Aggregated insights for the organization
IDE-based data collection can also be aggregated and analyzed at a macro level across the organization. This allows for insights into broader trends, such as:
- AI Content Breakdown: What types of AI-generated code are most prevalent in the codebase—boilerplate, logic, tests, documentation, configuration?
- Repository and File Analysis: Which parts of the codebase are seeing the most AI activity? Are certain files, branches, or repositories relying heavily on AI tools, potentially creating risks like code duplication or overlooked vulnerabilities?
- Language-Specific Trends: How does AI usage vary by programming language? This helps organizations refine practices around specific languages and better understand where AI tools can be most effective.
Next steps to anticipate AI risk and avoid surprises
Gathering data directly in the IDE makes it far easier to tell when code is AI-generated. It provides actionable insights that go beyond the high-level metrics from coding assistant APIs, helping to identify patterns and trends as they emerge. This data is crucial for mitigating risks, such as accumulating technical debt or introducing security vulnerabilities, and ensures that AI use in development is closely monitored and managed.
With this complete picture, organizations can make informed decisions on when to apply more scrutiny to AI-generated content, adjust code review processes, and introduce policies to prevent the uncontrolled accumulation of AI-driven changes. By having this information at their fingertips, engineering leaders can stay ahead of potential issues and ensure their codebase evolves in a controlled, secure, and efficient way.
If you're ready to gain deeper insights into AI's role to anticipate risks in your development process and avoid surprises in your codebase, the Faros AI VSCode extension is a great place to start.
Bonus: If you use Faros AI to visualize AI's impact on productivity, you can also centralize this data as part of your more holistic analytics.
Get started with the Faros AI VSCode copilot extension.
More articles for you
Editor's pick
Is the Build Time metric the right measure to demonstrate the ROI of Developer Productivity investments? Does it stand up in court? We found out.
Editor's pick
A guide to measuring continuous integration metrics, such as CI Speed and CI Reliability, and an introduction to the most important developer productivity metric you never knew existed.
Editor's pick
Revisiting McKinsey's software engineering productivity framework, Vitaly Gordon reflects on what's changed and how to implement McKinsey's visibility recommendations within days.
See what Faros AI can do for you!
Global enterprises trust Faros AI to accelerate their engineering operations.
Give us 30 minutes of your time and see it for yourself.