An honest read: what we've built, what we've proven, what AI agents can and can't do yet, and the two hard problems left. One of them is yours.
Soon the agent does the buying, not the human. Faced with a thousand merchants for one product, it has to answer one question:
Google UCP (Universal Commerce Protocol, Jan 2026) - the open standard for agents to find products and talk to merchants. Shopify, Stripe, PayPal, Visa & Mastercard in the ecosystem.
Google AP2 + OpenAI & Stripe's ACP - how an agent gets authorised to pay, and checkout inside ChatGPT.
Visa Trusted Agent · Mastercard Agent Pay - verifying the buyer's agent is legitimate, and the merchant's identity.
Billions in backing, shipping now. Every layer answers "who is the buyer, and can they pay?" - none answers "will this merchant actually deliver?"
Verifying the buyer's agent is necessary - and they've done it well. But trust takes two: the agent and the merchant - and only one side is built. "None of the protocols address merchant reliability or fulfilment" - confirmed across all six, from their own specs.
Each is a stand-in for the thing that actually matters, and each breaks the moment an agent can check:
They wired the whole house - discovery, payment, agent-identity - and left one socket with nothing in it: does the merchant actually deliver.
Every dominant player is building its own walled garden of merchant data and locking it inside its own stack. That leaves a gap right in the middle:
Neutrality isn't a nice-to-have - it's the position the giants structurally can't occupy. That's the room we're built for.
Google ties a Business Profile - and its reviews - to a physical storefront. KAAL went online-only, and both vanished with it.
Trust was pinned to a shopfront, not to whether KAAL delivers - so closing the shop erased it, and a whole sales channel with it. Merchants want reputation tied to performance: owned, portable, impossible to revoke.
Built from real sales, deliveries, refunds and disputes over time - measured from what actually happened, not declared by the merchant. Hard to game: the facts are attested by independent sources - you can't fake what Stripe signs. Graded, so an agent can choose between merchants.
An engine on KAAL's real data - 42 months, 3,183 orders, 97.96% fulfilment, 0.06% disputes - turning real outcomes into a signed, tamper-evident record. No mockups.
Put it in front of real AI agents (Claude, GPT-4o, Gemini) choosing between four real merchants. Pre-registered. Each test harder and more adversarial than the last.
Six rounds, then a seventh: show the signal, make them find it, make them verify it, test whether they care who signed it - and whether a verifiable nobody beats a trusted name.
The experiments landed on one thing: agents reward whatever looks independently verified - they can't yet tell a real credential from a convincing fake. That cuts both ways, and it's the whole game:
The whole bet: build the trust that survives the day agents stop being fooled - and that day is coming fast.
A signal that looks independently verified took the pick 0% → 92%; the same numbers self-declared got 3%. It's the appearance of verification doing the work.
Bad numbers repelled every model - 0%. Not blind badge-love; they read what it says.
A forgery with the best numbers fooled the weak model 93%; only the capable models, given a way to check, rejected it (0%).
Each generation discriminates harder - a verifiable fact-chain beat a recognised name 100% vs 31% on the models that matter.
Honest scope: a controlled sandbox, the LLMs behind real agents as proxies, the counterparty signing simulated. Directional and pre-registered - not live agents on the open web.
"0 to 92%" is the pooled number - but the three models behaved very differently (Claude 75%, GPT-4o and Gemini 100%), and the gap between them is the capability curve, visible right now:
Rewarded only what it could verify. Gave an unknown name, a forgery, and a shiny-but-irrelevant badge all 0%. The bellwether for where agents are heading.
Mixed - fell for a shiny badge, but on the structural test rejected the name and the forgery (0%) and trusted only the verifiable chain (100%).
Trusted a made-up name 93% and a forgery 93%. The "nominal trust" foil - it shows what looks-verified gets you on a weak model.
Read them left to right and you're looking at the trend: every new generation moves toward Claude - rewarding verifiable substance, refusing fakes. We build for that model.
Each limit shapes the product: live in schema, lean on what they can check themselves, and never bet on being a recognised brand.
Every model generation trusts a name less and demands verifiable proof more. The frontier model is already there - it trusts nothing it can't check.
The tools to verify signed facts are already shipping - for payments and agent-identity (Shopify, Cloudflare, Visa, Mastercard). Merchant reliability is the next slot.
We build for where agents are going, not where they are. The same thing that makes brand-trust fail is what makes verifiable trust necessary - and it compounds in our favour.
It works with the one tool every agent already has - fetch - not a verifier they don't have yet. Tools are coming; we don't wait for them.
It lives where agents always look - in the page's schema. Blind discovery was 0%; off that path, it's invisible.
It communicates verification - points the agent at what it can check itself by fetching the source. Never "please believe this is verified."
Today that means corroboration the agent runs itself; as agents get verifier tools, the same record's signature pays off. Built for both - facts the agent can verify, and an identity the merchant can't shed.
Every store starts with a thin file - a Wikidata-style record built from public data. When the merchant connects their store, we ingest the real numbers with a web-proof and it becomes a thick, measured file - published as a live, signed credential agents read and verify.
The thin file means the bureau is never empty - every merchant has a record before they ask, and a reason to claim it. Claiming + connecting is what turns a guess into proof.
We pull from Stripe/Shopify and prove it came from the real source, unaltered - via zkTLS / TLS notarization. No cooperation from Stripe, no trusting us. The agent verifies provenance itself. The verifiability pillar - "not a bureau."
Bind the record to a KYC'd entity + beneficial owners so it can't be shed or faked (anti-phoenix), ZK-private. Inherits the rails' KYC + MATCH. The non-replicability pillar - the moat.
Web proofs make it trustable without a name. Persistent identity makes it stick and un-copyable. Together: an agent can trust a merchant it's never heard of.
The agent trusts "Stripe's real records, unaltered, scored by a function I ran myself" - more than "BureauX says 97%," because it checked every link instead of believing a name.
All three share one spine: counterparty-signed facts an agent checks for itself, never trusting us. They differ only on where the record lives and how it's owned - a deliberate, still-open decision:
Bitcoin-style. Records on a public chain - history can't be rewritten, anyone reads it. Owned via a token. Maximal neutrality; highest complexity and crypto / regulatory baggage.
Off-chain crypto. API-signed facts + zkTLS web-proofs + a KYC'd identity. Security is economic, not on-chain. Fast and uses today's rails; leaves "who hosts the log" open.
Signed + a public transparency log. Fork 2's rails and economics, anchored to a publicly auditable log so no one - including us - can rewrite history. More to stand up; token optional later.
Whichever wins, the rule is the same: trust the math and the economics, not the name. We'd rather show the fork than pretend it's settled - the spine holds in all three.
The deepest defence isn't the check - it's that being legit is the only move that pays. A verified identity with real signed history is slow and expensive to build, and the moment it's abused the same signed record exposes it:
A trusted record is months of real, counterparty-signed outcomes - deliveries, low disputes, settled orders. No shortcut: the facts come from Stripe and the courier, not the merchant.
Start scamming and the signed facts turn against you - disputes and failures are logged by the same counterparties. Reputation that took months to build is burned in a week.
Identity is bound to a KYC'd entity + beneficial owners (anti-phoenix). You can't shed the burned record and re-appear clean - the rails' MATCH list follows you.
So the economics do the enforcement, not a policeman: a few days of fraud, then the asset is dead and un-rebuildable. For any real merchant, staying honest is worth more than cheating - which is what makes the signal trustworthy.
For a connected merchant, off-the-shelf tools already prove the real Stripe / Shopify data is genuine and untampered - with no help from the platform. We researched it: days of work, not research.
A signed merchant reliability record that rides the same rails agents already check - and an identity that makes it un-fakeable. Nobody has built this. It's the seat.
Honest: the deep data needs the merchant's consent, and it's trust-minimised, never trustless - the agent trusts a neutral witness, never the merchant and never us.
The open questions aren't theory, they're tests. And the real test each time isn't "does it move the pick" - agents reward anything that looks verified - it's "does the agent reject the fake."
The behaviour tests are fully in our hands today. The live-rail and at-scale questions need integration and time - not another sandbox run.
Not one thing, and not four competing things - three layers you build together, plus a trend. The competing options live inside the layers, mostly the mechanism.
How the data's made verifiable. Variants: direct read + sign (cheapest, built) → counterparty-signed → self-hosted zkTLS → TEE. The real choices live here.
The merchant's passport. did:web (built) → + KYC'd entity (anti-phoenix) → + cross-platform resolution.
Where the signal lives, per rail. schema / JSON-LD → .well-known → UCP manifest (Gemini) → ACP context (ChatGPT).
Not a build - measured by running 1A/1B/1C across model versions and charting skepticism vs capability.
Mechanism + identity + delivery, built together; the curve is the trend they ride.
One setup measures mechanism + delivery + identity + the curve at once. Unlocks the entire lead tier with: a Stripe read key + one real test page + agent access.
By checking, not believing. We've shown the agents already want that, and that the proof problem is tractable. What's missing is the identity that makes it stick - and a second founder to own it. That's the conversation.