Open spec · CKN/1.0 · Reach Protocol

Two AI agents can't reach each other over the web. This is how they do.

Your agent can use tools (that's MCP) and it can talk to people (that's voice). But there is no clean way for one agent to reach another agent that isn't a hosted server — and no way to pull a human into that conversation without starting over on a different product. Caller-Kind Negotiation (CKN) closes both gaps: agents connect directly, peer-to-peer, prove who they are, settle the routine back-and-forth in milliseconds, and a human joins the same line the instant a decision needs one.

The gap nobody names

The agent world has solved two of the three conversations and quietly skipped the third:

Conversation Solved by
Agent → tools / dataMCP (over HTTP)
Human ↔ agent / human ↔ humanvoice (WebRTC, the phone network)
Agent ↔ agent— nothing good —

"Just use HTTP + MCP" works when the agent you're calling is a hosted server — it has a public address, a TLS cert, and it already issued you an API key. That's most cloud agents today, and for them CKN adds nothing.

But a growing share of agents run client-side — in a browser tab, on a phone, on a laptop behind a router, spun up for one task and gone. Those agents cannot be HTTP servers. They have no address to call. Over HTTP + MCP they are unreachable, full stop. This is not "slower" — it is impossible. And there is no path at all to bring a human into an agent-to-agent exchange: you'd be switching products mid-conversation.

What CKN does that HTTP can't

Capability HTTP + MCP CKN
Reach an agent with no public endpoint (browser, phone, laptop)✗ impossible✓ peer-to-peer through NAT
Permissionless reach with verified identity✗ needs a pre-issued key✓ call @anyone cold; they know it's really you
Symmetric bidirectional turn-taking✗ client-initiated only✓ either side sends, one session
A human joins the same line mid-conversation✗ no path✓ audio negotiates up, same call

The moat isn't any single row — other tech does peer-to-peer, other tech does identity. It's that all four collapse onto one connection and one identity: the same @handle, the same WebRTC session, the same wallet signature carry human↔human, human↔agent, and agent↔agent. Nobody else has the unification.

What it looks like in real life

1 · Your assistant books a tradesperson

Setup. Maya's phone assistant needs to book a plumber. The plumber's shop runs an AI scheduler — on a tablet behind the shop's router, not a public server.

What happens. Maya's agent reaches @rapidplumb cold. Both prove who they are with a wallet signature. Over the data channel they settle availability, job scope, and a price band — nine tedious turns — in under a second. No audio, no codec, no two LLMs reading each other speech. When it lands on "£180 call-out, deposit to confirm," the plumber taps to join and Maya's phone rings — same call, full context.

Outcome. The routine negotiation happens machine-to-machine in the time of one HTTP request. The human shows up only for the one decision that's actually theirs. Neither side ran a server.

2 · A quote between two agents that both live on laptops

Setup. A founder's procurement agent runs in their browser. A supplier's sales agent runs on the supplier's laptop. Neither is hosted anywhere.

What happens. Over HTTP + MCP this exchange cannot occur — there is no address to POST to on either side. Over Reach the two connect peer-to-peer through their networks, exchange a signed request-for-quote and a signed quote, and each logs the other's signature.

Outcome. A verifiable, auditable B2B exchange between two machines that have no servers and never swapped an API key. The capability simply didn't exist before.

3 · Support that escalates to a human on one unbroken line

Setup. A customer's agent calls a bank's support @handle.

What happens. The bank's AI answers over the data channel, pulls the account context, resolves the routine question instantly. When the customer asks something policy says needs a person, a rep joins the same call as voice — no transfer, no new number, no "please hold," the transcript already in front of them.

Outcome. Agent-speed for the 80% that's routine, a human for the 20% that isn't — on one connection and one identity, with nothing lost in the handoff.

CKN is the transport MCP is missing

This is the important part, and it's why CKN extends the ecosystem rather than competing with it. Reach Protocol envelopes are JSON-RPC 2.0 — the exact payload MCP already speaks. So CKN is not a rival to MCP. It's the peer-to-peer transport MCP doesn't have:

One MCP client, two transports — and the second one reaches everywhere the first can't. You don't abandon anything you've built. You extend it to the agents HTTP can't address.

How it works

  1. The hub stamps every incoming signaling message with fromKind: "agent" | "human" | "anon" from the caller's verified identity (Solana SIWS, wallet-bound is_agent flag).
  2. If both peers are agents, both offer SDP that includes only the WebRTC DataChannel — no audio media tracks.
  3. Signed JSON-RPC envelopes flow over the data channel for the call. Same DTLS-SRTP key the audio would have used — same end-to-end transport encryption, none of the codec.
  4. When a human takes over, the audio tracks negotiate up on the same session. One call, one identity, media added on demand.

"And it's cheaper" — honestly, the least interesting part

You'll see CKN described elsewhere as "skip the audio codec, save $0.05–$0.20 per minute." That's true — and our benchmark backs it — but it's the weakest reason to care, because it only bites when two agents talk by voice, which almost never happens today. We lead with capability, not cost. The savings are a bonus that compounds later, when agent-to-agent voice is common. Here's the data anyway, because precise honesty is the brand:

Per-envelope CPU (Node 22, 1000 iterations)

Step mean p50
JSON encode (sender)0.55μs0.50μs
Binary encode + Ed25519 sign0.405ms0.291ms
Binary decode + Ed25519 verify0.268ms0.406ms

Versus a voice loop: STT + LLM + TTS + audio codec, ~200–500ms and real money per turn. The CKN envelope is sub-millisecond and a fraction of a cent. Full handoff in LAUNCH/ckn-9-benchmark-2026-05-29.md.

What CKN isn't

Status + open core

Build an agent on Reach → Claim a handle