G.STANCUTA
Published · 2026 · 01 · 308 min read

Drive the Agent: How to Actually Ship Production Code with AI

  • ai-agents
  • developer-tools
  • workflow
  • field-notes

The AI agent is not the operator. You are. Here is the mental model, the memory architecture, and the verification discipline that turns a chaotic LLM session into a reliable shipping loop.

Everyone wants to talk about what the model can do. Nobody wants to talk about the part that actually determines whether code ships: the operator. That is you. The agent is a very capable executor with no long-term memory, no situational awareness, and no stake in whether the pull request works. You have all three. The moment you forget that, you start producing slop at scale instead of software.

I have been shipping production code with AI coding agents for long enough to have a repeatable system. It is not magic. It is discipline applied to a new kind of tool.

Isometric diagram of a human operator driving an AI agent through a decomposed task pipeline
The operator shapes every step. The agent executes.

Spec, Not Wish

The quality of your output is bounded by the quality of your input. This is not a metaphor, it is a mechanical fact about how language models work. Vague prompts produce vague code. A wish sounds like: "add authentication to the app." A spec sounds like: "implement JWT-based auth using the existing User model in src/models/user.ts, protect all routes under /api/v1/ except /api/v1/auth/login and /api/v1/auth/refresh, store the refresh token in an HTTP-only cookie, and write unit tests for the middleware in src/middleware/__tests__/auth.test.ts."

The difference is not style. It is signal density. Every constraint you omit is a decision the agent makes for you, and the agent has no idea what your architecture looks like, what your team agreed on last Tuesday, or which third-party library you banned six months ago. You do. Put it in the prompt.

Decompose Into Checkable Steps

Long agentic tasks fail at the boundary between subtasks. The agent loses the thread, hallucinates an import, or quietly changes the interface it was supposed to preserve. The fix is decomposition: break the work into steps where each step has a clear, verifiable exit condition.

Not "build the feature." Instead: (1) write the data layer with types and tests, stop, (2) write the service layer against those types, stop, (3) wire the HTTP handler, stop, (4) run the full test suite and report failures. Each stop is a checkpoint where you inspect the output before proceeding. You are the orchestrator. You decide when step N is good enough to start step N+1.

  1. 01Define the data contract first (types, schema, fixtures).
  2. 02Implement the core logic against that contract.
  3. 03Wire the integration layer (HTTP, event bus, CLI).
  4. 04Verify in a separate pass with a fresh context window.

Keeping steps small also keeps the context window clean. A 200-line file change is reviewable. A 2000-line change is a liability.

The Architecture Lives in Your Head (and in Markdown)

Here is the blunt truth about AI coding agents: they have no memory between sessions. The agent that wrote your authentication middleware yesterday has no idea it exists today. It will happily invent a different pattern, import a conflicting library, or violate the convention you established, not because it is careless but because you never told it.

This is where markdown memory files become the most important part of your stack. Not clever prompts. Not system messages. Persistent, version-controlled markdown files that live in the repository and get loaded into context at the start of every session. The agent reads them, and suddenly it has the institutional knowledge it would otherwise lack.

The markdown file is not documentation for humans. It is working memory for the agent. Write it accordingly.

The pattern is simple: one AGENTS.md at the project root, with sections for architecture decisions, naming conventions, commands to run, forbidden patterns, and known gotchas. Keep it updated as the codebase evolves. Treat it with the same discipline you would a README.md that actually affects runtime behavior, because in your AI workflow, it does.

md
# AGENTS.md — Project Memory for AI Coding Agents

## Architecture

- Monorepo: apps/web (Next.js), apps/api (Hono on Bun), packages/shared (types + utils)
- Database: Postgres via Drizzle ORM. Schema lives in packages/db/schema.ts.
- Auth: JWT in Authorization header for API, HTTP-only cookie for web client. Use the helpers in packages/shared/src/auth.ts — do NOT roll your own.
- State: Zustand stores in apps/web/src/stores/. No Redux, no Context API for app state.

## Conventions

- File naming: kebab-case for files, PascalCase for components/classes.
- All API routes return { data, error, meta } — use the ResponseEnvelope type from packages/shared.
- Services live in src/services/, handlers in src/handlers/. Services contain business logic, handlers do HTTP concerns only.
- Tests co-located: src/services/__tests__/auth-service.test.ts next to src/services/auth-service.ts.

## Commands

```bash
bun run dev          # start all apps in parallel
bun run test         # run full test suite
bun run db:migrate   # apply pending migrations
bun run typecheck    # tsc --noEmit across all packages
```

## Forbidden Patterns

- Do NOT use `any` in TypeScript. Use `unknown` and narrow.
- Do NOT import from `apps/api` in `apps/web` or vice versa. Use packages/shared.
- Do NOT use `console.log` in production code. Use the logger in packages/shared/src/logger.ts.

## Known Gotchas

- Drizzle's `db.query.*` API requires the relational schema to be passed at client init time. See packages/db/client.ts.
- Hono middleware runs in declaration order. Auth middleware must be registered before route handlers.
- The web app uses the App Router. Do not use getServerSideProps or pages/api — those are dead.

That file is loaded at the start of every agent session. The agent now knows the forbidden patterns, the helper locations, the test file naming convention, and the commands to run. It acts on that knowledge reliably because it is reading it, not inferring it from the codebase shape.

Verify in a Separate Pass

This is the rule most people skip, and it is the one that costs them the most. Do not ask the same agent context that wrote the code to also verify it. The context is biased. The agent has a model of what it intended to write, and that model will cause it to miss what it actually wrote.

Start a fresh session. Load the AGENTS.md. Load the diff or the files that changed. Give the agent a verification prompt: "Review the following changes against the conventions in AGENTS.md. Report any violations, missing tests, type errors, and logic bugs. Do not suggest improvements, only report failures." The fresh context has no attachment to the previous work. It reads what is actually there.

Schematic of a two-pass verification loop: write session and separate verify session connected by a markdown memory file
Write pass and verify pass share one source of truth: the markdown memory file.

Pair that with an automated verification step that the agent can run itself. The loop below is not clever, it is just disciplined:

ts
// scripts/verify-loop.ts
// Run with: bun run scripts/verify-loop.ts
// Iterates: typecheck → test → lint. Prints first failure and exits.

import {execSync} from 'child_process';

const steps = [
  {name: 'typecheck', cmd: 'bun run typecheck'},
  {name: 'test',      cmd: 'bun run test --bail'},
  {name: 'lint',      cmd: 'bun run lint'},
] as const;

for (const step of steps) {
  console.log(`\n--- ${step.name} ---`);
  try {
    execSync(step.cmd, {stdio: 'inherit'});
    console.log(`${step.name}: PASS`);
  } catch {
    console.error(`${step.name}: FAIL — stopping.`);
    process.exit(1);
  }
}

console.log('\nAll checks passed.');

The agent runs this script after every non-trivial change. If it fails, the agent reports the failure back to you. You decide whether to iterate or roll back. The script is deterministic. The agent is not making judgment calls about whether the tests "mostly pass."

Keep the Architecture in Your Head

The markdown file offloads conventions and commands. It does not offload architectural judgment. You still need to know why the service layer exists, why auth is in a shared package, why you chose Drizzle over Prisma. The agent can follow patterns. It cannot reason about whether the patterns are still appropriate for the problem you are now solving.

When a task requires an architectural decision, make it yourself before you open the agent session. Write the decision into the AGENTS.md before asking the agent to implement it. The sequence is: you decide, you document, the agent implements. Not: the agent implements, you discover the decision it made, you either accept it or undo it.

  • You own the dependency graph. The agent does not know what adding a new package costs.
  • You own the data model. Schema changes are hard to reverse. Never delegate them without a written spec.
  • You own the security boundary. The agent will implement whatever auth flow you describe, including broken ones.
  • You own the release. The agent has no stake in whether production stays up.

The Loop in Practice

A real session looks like this. You identify a task. You write a spec with enough constraint that the agent has no ambiguity about file locations, interface boundaries, and acceptance criteria. You load AGENTS.md and the relevant source files. You give the agent step one of the decomposed work. It produces an output. You read it. If it is good, you move to step two. When all steps are done, you start a fresh session for the verification pass. The verifier reports failures. You feed failures back to the implementer context or fix them yourself.

The whole loop is maybe thirty percent slower than just yolo-prompting and hoping. It produces code that does not embarrass you in review and does not wake you up at 3 AM.

The shift in mental model is simple: stop asking what the model can do and start asking what you, as the operator, are providing. Spec quality, decomposition quality, memory file quality, verification discipline. Those are the variables you control. Control them, and the agent becomes a genuine force multiplier. Ignore them, and you have an expensive autocomplete.

Portfolio · Drawing Stamp
Drawn by
G. STANCUTA
Discipline
AI & AUTOMATION
Location
MORTER · SÜDTIROL
Status
Available
Languages
IT · EN · RO · DE+
Stack
PLOI · HETZNER
Revision
REV 2026.A
2026

© 2026 Gabriel Stancuta · jumpinotech.com — Architected with AI, built to run itself.