AI coding agents are having their moment. Devin made headlines. SWE-bench became the benchmark everyone cites. Open-source alternatives like SWE-agent, OpenHands, and Aider are iterating fast. But beneath the demos and benchmark scores, the practical question remains: what can these systems actually do in production today?
We have been using AI agents in our own development workflow for months now. Here is an honest breakdown of where things stand.
// What an AI coding agent actually is
Strip away the marketing and an AI coding agent is an LLM in a loop with tools. It can read files, write files, execute commands, and observe the results. The loop continues until the task is done or the agent decides it is stuck.
The difference from a chatbot is agency — the model decides what to do next, not the user. It plans, acts, observes, and adjusts. This is powerful when it works. It is also where things go wrong when the model misunderstands the task or lacks context about the codebase.
// What works well today
- Well-scoped tasks: Bug fixes with clear reproduction steps, adding a new endpoint to an existing pattern, writing tests for existing code. When the task is bounded and the codebase conventions are clear, agents perform surprisingly well.
- Boilerplate and scaffolding: Generating CRUD operations, setting up project structures, implementing standard patterns. Agents excel at repetitive work that follows established templates.
- Code migration: Updating import paths, converting class components to hooks, migrating API versions. Mechanical transformations with clear before/after patterns.
- Documentation and tests: Writing docstrings, generating unit tests, creating README files. Tasks where the source of truth already exists in the code.
// What does not work yet
- Ambiguous requirements: “Make it faster” or “improve the UX” are still too vague. Agents need precise specs to produce useful output. The planning phase is where humans remain essential.
- Large-scale refactors: Changes that touch dozens of files and require understanding system-wide implications. Context windows are bigger than ever, but they still cannot hold an entire production codebase.
- Novel architecture: Designing a new system from scratch where the right patterns are not obvious. Agents are strong pattern matchers, not original thinkers. They reproduce what they have seen, which is great for known problems and unreliable for novel ones.
- Debugging subtle issues: Race conditions, memory leaks, performance regressions — problems that require building a mental model of runtime behavior. Agents can try fixes, but they lack the deep system understanding that experienced developers bring.
// The orchestration problem
The single biggest limitation of current agents is not model quality — it is orchestration. An individual task might succeed, but chaining 20 tasks together reliably is a different problem entirely. Context management, error recovery, task dependencies, and progress tracking all need infrastructure around the model.
This is why we see frameworks like Claude Code, Cursor, and Windsurf investing heavily in the scaffolding around the model. The agent itself is table stakes. The system that keeps it on track is where the real engineering challenge lives.
// Our approach
We treat AI agents as junior developers who are fast, tireless, and good at following instructions — but who need clear specs, bounded tasks, and review checkpoints. The human role shifts from writing every line to defining the work, reviewing output, and handling the genuinely hard parts.
The best results come from investing in the spec. A well-written task description with clear acceptance criteria will outperform a vague prompt sent to any model, regardless of size or capability. Garbage in, garbage out still applies.
// The bottom line
AI coding agents are not replacing developers. They are changing what developers spend their time on. The teams that figure out the right division of labor — humans for architecture and judgment, agents for execution and iteration — will ship faster than those waiting for a fully autonomous solution that does not exist yet.