Software Development

What I Learned Building Software With AI Agents

Over the past year, AI coding assistants have evolved from novelty to necessity. Tools like Claude Code, Cursor, and GitHub Copilot have become part of my daily workflow - not just for autocomplete, but as genuine collaborators that understand context, suggest architecture, and write substantial chunks of production code.

I've had the opportunity to use these tools across a wide variety of projects: legacy codebases that grew organically over a decade, greenfield applications built from scratch, complex enterprise systems with intricate domain logic, and simple utilities that needed to ship fast. Along the way, one thing became clear: the principles we've always known about code quality matter even more when working with AI agents.

The Spectrum of Codebases

Working with AI agents across different project types revealed clear patterns.

Mature codebases presented unique challenges. Not because the AI couldn't understand established code - it actually did pretty well at parsing even sparsely documented systems. The challenge was navigating organic growth. Codebases that evolved over years naturally accumulate different approaches as teams change and requirements shift. Sometimes the AI would mirror this diversity, generating code that followed whichever pattern it happened to see most recently in the context window. Other times, it would ignore project-specific patterns entirely and fall back to generic best practices from its training data—introducing, a style that was technically sound but inconsistent with the existing codebase.

Greenfield projects were the opposite experience. Starting fresh meant I could establish clear patterns from day one, and the AI would follow them consistently. Once it saw how I structured a controller, a service, or a test file, it could replicate that pattern quite accurately. The investment in initial architecture paid off quickly.

Complex enterprise systems revealed an interesting dynamic. AI agents excelled at the mechanical parts like implementing CRUD operations, writing validation logic, generating API endpoints that followed established patterns. Where they needed more guidance was with nuanced domain logic that required deep understanding of business rules. This is where human expertise remained essential.

Simple applications were where AI worked best. Small utilities, scripts, and focused tools could often be built almost entirely through AI collaboration. The limited scope meant the agent could hold the entire context in mind and produce coherent, complete solutions.

How AI Agents Have Improved

The agents I'm using today are noticeably better than what I started with months ago. A few observations:

Context understanding has improved a lot. Early on, I'd constantly need to remind the AI about project structure, existing utilities, or established patterns. Now, with tools like Claude Code that can explore the codebase autonomously, the agent often understands the project better than I remember it myself.

Code generation has become more reliable. The ratio of code I can use directly versus code I need to fix has shifted substantially. Where I used to treat AI suggestions as rough drafts, I now regularly commit generated code after a quick review.

Pattern adherence is stronger. When a codebase follows consistent patterns, modern agents pick up on them and replicate them accurately. They respect naming conventions, follow established file structures, and match the style of surrounding code.

That said, I'm not claiming AI agents are flawless. They still hallucinate occasionally, still make assumptions that don't match reality, and still sometimes suggest solutions that are technically correct but architecturally wrong. The difference is the baseline quality has risen enough that these issues are exceptions rather than the norm.

The Code Quality Connection

Classical software engineering practices - the things we've always known we should do - translate directly to better AI collaboration. The patterns aren't just good for humans; they're good for agents too.

Static Analysis and Type Systems

Type systems have become guardrails for AI-generated code. When I work in TypeScript with strict mode enabled, the AI rarely produces code with type errors. The type system acts as a continuous feedback mechanism: if the generated code doesn't compile, we know immediately and can correct course.

Linting rules serve a similar purpose. ESLint and Prettier ensure that even AI-generated code follows project conventions. No more debates about formatting, no more inconsistencies creeping in. The tooling catches deviations before they land.

Automated Tests

This is where my perspective shifted most significantly. I used to think of tests primarily as safety nets for human developers. Now I see them as essential infrastructure for AI collaboration.

Tests verify AI-generated code immediately. Instead of manual testing or hoping the code works, I can run the test suite and get instant feedback on whether the generated code functions correctly. This shortens the iteration loop considerably.

But tests also serve another purpose: they're examples the AI can learn from. When the agent sees how existing features are tested, it can replicate that pattern for new features. Test files become a form of documentation that's both human-readable and AI-parseable.

Here's an observation that changed my workflow: there's no excuse for skipping tests when you have access to LLMs. The argument that tests take too long to write falls apart when an AI can generate a comprehensive test suite in minutes. I've started treating test coverage as a prerequisite for AI collaboration, not an afterthought.

Architecture and Best Practices

Clear module boundaries help AI understand scope. When a codebase has well-defined responsibilities the AI generates code that respects those boundaries. It won't accidentally put billing logic in the authentication module because the structure itself communicates intent.

Consistent patterns enable reliable replication. If every controller follows the same structure, every service has the same lifecycle, and every test file uses the same conventions, the AI can extend the codebase reliably. Consistency reduces ambiguity, which leads to more reliable code generation.

Things I Learned While Working With AI Agents

A few practical takeaways from months of AI-assisted development:

Start with architecture. Investing time in clear structure at the beginning pays dividends throughout the project. The AI amplifies whatever patterns you establish.

Type everything. The stricter your type system, the fewer errors the AI will produce. TypeScript with strict mode, Zod for runtime validation, proper interfaces for all data structures.

Harden behavior with tests. Every feature should be backed by tests that catch drift. When AI modifies code, the test suite immediately surfaces whether existing behavior broke. Tests become the source of truth that keeps both humans and AI honest.

Keep modules focused. Small, single-responsibility modules are easier for both humans and AI to understand. Large files with multiple concerns confuse everyone.

Document constraints explicitly. If there are patterns the AI should follow or anti-patterns it should avoid, write them down. The AI will read documentation and follow it.

Review everything. AI agents are quite capable, but they're not infallible. Treat generated code like you'd treat a pull request from a new team member - trust but verify.

Use the feedback loop. When the AI generates incorrect code, figure out why. Often it's because the codebase has conflicting patterns or missing context. Fix the underlying issue, not just the symptom.

What This Led To

These observations across multiple projects pointed toward a clear conclusion: the codebases where AI collaboration worked best shared common characteristics. Strong typing, comprehensive tests, clear architecture, consistent patterns, and explicit documentation.

This realization also changed how I approach mature codebases. I've migrated several legacy systems to modern, AI-friendly stacks—and depending on the size of the codebase, this has become an almost trivial task with LLM assistance. The AI excels at understanding existing business logic and translating it to a new structure, especially when the target architecture is clean and well-documented. What used to be a months-long rewrite project can now often be accomplished in a fraction of the time.

This led me to create a template that embodies these principles. AgentReady Stack is a B2B platform boilerplate specifically designed for AI-assisted development. You can either buy the code or just study the documentation and maybe get some inspiration from it.

The tech stack is a Turborepo monorepo with three applications: a NestJS backend with MongoDB, a React Router v7 frontend with Material-UI, and an Oclif-based CLI. Types flow automatically from backend DTOs through OpenAPI to a generated API client, so there's a single source of truth for types across the entire stack.

Out of the box, it includes a complete authentication system with email verification and password reset, multi-tenant organization management with role-based membership, invitation flows, and API key management for CLI authentication. The kind of features that many B2B platform need.

More importantly for AI collaboration: the template includes AGENTS.md as well as Claude Code and Cursor specific documentation that AI agents read, consistent patterns throughout all three applications, and comprehensive test coverage that catches when generated code breaks existing behavior. The structure is designed so that an AI can study the existing organizations module and reliably extend it for new features like projects, products, or whatever the domain requires.

The goal isn't to replace human developers with AI. It's to enable the kind of rapid iteration that becomes possible when humans and AI agents can collaborate effectively. When the foundation is solid, adding features becomes a conversation rather than a struggle.

Looking Ahead

AI-assisted development is still evolving rapidly. The agents I'll be using six months from now will likely be more capable than today's. But I'm increasingly convinced that the fundamentals won't change: clean code, strong types, comprehensive tests, and clear architecture will remain essential—perhaps more essential than ever.

The tools are getting better at writing code. Our job is to create environments where that code has the best chance of being correct.

Raphael Stäbler

Raphael