Agent Runtime Surface
An Agent Runtime Surface (ARS) is a dynamic runtime contract embedded within an application that exposes its current navigable state and intent-level actions, enabling agents to traverse and enact valid state changes within the application’s governed state model.
In plain terms, the app publishes what it knows about itself right now. Where you are, what actions are available from here, and what the app will permit. The agent reads that surface and acts within it.
Claude operates in an agentic loop, making real HTTP requests to a Python app to book a round-trip flight and hotel in under three minutes.
Your booking request:
Book a round-trip Los Angeles trip departing 2026-05-12 returning 2026-05-15 under $800 total with hotel. I strongly prefer Lunera Air for both legs.
Make each selection based on the screen in front of you — don't try to plan the whole trip upfront. We'll check how the budget looks at the review screen.
At each step, Claude received the app’s current truth: where the agent was, what was available, and what was permitted. The agent reasoned; the app held the facts.
The agent in the demo wasn’t given a list of REST endpoints to call in some order. It was given an application: a Python application that held flights, hotels, dates, preferences, a budget, and the rules connecting them.
This distinction is what makes ARS work. A capability is a discrete operation, like booking a flight or charging a card. An application is the layer where capabilities become intentional, organized, and structural. Facts get held and updated, rules get enforced, aggregates get maintained, and valid actions get presented from the current state. The application is what turns a set of operations into an environment shaped around getting something done.
Git gives developers commit, branch, and merge as capabilities. GitHub is the application layer built on top: a governed environment for collaborative software development. The capabilities are the same; the application layer is what makes them workable for a goal-pursuing user. ARS is the contract for that application layer when the user is an agent.
This is why ARS is not just exposing REST endpoints to agents. Exposing endpoints to an agent gives the agent capabilities. Exposing an application to an agent through ARS gives the agent an environment with current state, valid possibilities, governed rules, and aggregates the agent can read without having to compute them.
An application can have a visual UI for humans, an ARS for agents, or both. The same intentional, organized, structural environment underneath. Different surfaces for different consumers.
ARS sits near several patterns that share surface features but solve different problems. Five worth naming directly. As you read, watch where the state lives.
Tool-calling protocols expose a catalog of capabilities a model can invoke. The agent holds the workflow and the state across calls. ARS exposes an application, not a catalog. The application holds state, governance, and aggregates. The agent reads, reasons, and acts within an environment rather than composing capabilities into one.
CLI tools have gained traction as agent surfaces because they expose commands through help output rather than dumping the full catalog into context. ARS shares the instinct that agents shouldn’t be flooded with every possible operation upfront. The difference is what does the filtering. CLI relies on the agent walking help output to discover what’s available. ARS publishes only what’s currently valid from the current state. The agent sees what the app says is possible right now, not a menu of every command.
REST exposes capabilities over HTTP. Consumers code against frozen contracts; state lives in clients or session stores. ARS is a runtime contract that updates on every read, with state held by the application. ARS assumes reasoning consumers, which is what makes the architecture viable.
AG-UI standardizes how an agent connects to a user-facing application it drives. The architectural arrows go agent to frontend: the agent is the system, the UI is its surface. ARS inverts those arrows. The application is the system, and ARS is how it publishes state and possibilities to agents that read it. Different problems, different positions in the stack.
A2UI is about agents authoring UI specs that clients render for humans. ARS is about applications publishing surfaces for agents. Different consumers, different producers, different problems. ARS and A2UI can coexist on the same product without architectural conflict.
The thread: in each pattern above, the agent ends up doing application-layer work itself. The agent carries the state, the workflow, and the governance because nothing else does. ARS doesn’t change that the agent is still doing application-layer work for its own goals. It changes what the agent has to hold internally. With ARS, the agent doesn’t carry the internal state of the applications it consumes. It also doesn’t have to compute what’s currently possible from that state. The application publishes both. The agent reads, reasons, and acts across them.
ARS is a contract that sits between an application and the agents that consume it. Two relationships matter:
Reasoning consumers that read the surface and dispatch actions. The agent could be a model in an agentic loop, a semantic-test harness, a multi-agent orchestrator, or anything else that reasons about state and goals.
The intentional, organized, structural environment that holds state, governance, and aggregates. The application uses whatever infrastructure it needs underneath: databases, REST services, message queues, third-party APIs.
An application can have many surfaces. A visual UI for humans, ARS for agents, others for other consumers. ARS is the contract for the agent-facing one.
ARS is a contract shape, not a transport. The trip demo runs over HTTP. The same contract can ride other transports without changing what the contract describes.
ARS doesn’t replace any of the application’s existing infrastructure. It adds a new contract surface for a new kind of consumer.
ARS supports three modes. Different jobs, different surfaces, same primitive underneath.
The application exposes its current step-by-step structure: state, options, valid actions, context. The agent walks the same flow a human would, one step at a time, reasoning at each step. The trip demo above is a Semantic UI Mode run. Best for tasks where decisions interact across the flow and the application can show the agent the consequences as choices accumulate.
The application exposes larger-grained actions, still bound by current state. Instead of walking step-by-step, the agent invokes actions like startTripCreation, getAvailableFlights(params), and selectFlight(flight_id). Each action is only valid from a particular state, so the agent still works against current truth. The tradeoff: the agent does more reasoning per step, which works well for simple tasks (order a pizza for 5pm) and less well for tasks where constraints interact (order as many pizzas as I can get delivered at 5pm for $50). Best for well-bounded tasks where the agent can reason cleanly across the inputs.
The agent sees what a first-time human user would see. The surface is progressively disclosed, the same way a person discovers an app. The agent isn’t completing a task. It’s simulating human perception, and the signal it produces is behavioral fidelity: did the agent notice what a real person would notice, find the path a real person would find, get stuck where a real person would get stuck. Best for usability testing at scale during application design.
Each mode is a different projection of the same underlying state and rules.
With ARS, three things move from the agent to the application. Each is something REST and tool-calling agents have to do themselves.
The rules an agent must follow live in application state and are enforced by the application. The agent doesn’t have to try an action to find out if it’s permitted. The application publishes what’s currently valid before the agent acts. Compliance is a property of the architecture, not a hope about agent behavior.
Facts, aggregates, and current possibilities are held by the application and updated as state evolves. The agent dispatches an action; the application accepts or rejects it; the next surface read reflects whatever changed. The agent doesn’t compute aggregates. Doesn’t track running totals. Doesn’t maintain its own model of what’s currently valid. The application maintains that model and publishes the current version on every read.
The application publishes facts and current possibilities. The agent’s path through them is determined by its goal. Reorder the screens, restructure the actions, change what’s exposed at each step. The agent still reaches the goal because the path was being computed against the goal, not against an authored sequence.
Humans do not need to know what version of a website or mobile app they’re using because they can reason about what they see. The same property carries to ARS application contracts: when the consumer can reason, the contract can evolve without version negotiation. No versions to pin, no migrations to coordinate.
Apple taught developers a version of this lesson with iOS: ask if a capability exists, not which version is running. ARS does the same thing one layer up. The surface describes what is true and what is possible right now. Rename a field, add a parameter to an action, or restructure what options a screen offers. The next surface read carries the current truth, and the agent reasons against that.
Reasoning systems do not require versioning when the surface describes current and possible. Versioning is what architectures need when the consumer cannot adapt to change. ARS does not have that problem because agents can.
One caveat. The protocol envelope itself (wire format, action dispatch shape) has its own lifecycle and tends to require versioning as protocols change over time. ARS makes the application contract free to evolve; the wire format underneath is its own concern.
The articles develop ARS in depth. Article 1 names the primitive and explains why the layer is missing.