Why AI Agents Struggle with the Web — and How to Fix It

September 4, 2024 Daniel Bentes

AI agents are supposed to make life easier. But the way they interact with web services is broken. They either scrape data from websites or use APIs, and both methods have serious problems.

Web scraping is a hack. It’s like building a house with duct tape. It works until something moves, which on the web, is all the time. Web pages change, and your code breaks. It’s inefficient, error-prone, and a constant battle to keep it working. And even when it does work, it often feels like it shouldn’t. Scraping can violate a site’s terms of service, and it’s usually a terrible experience for both the AI and the people who run the sites.

Similar problems exist in simulated browser interactions. Simulating user actions like clicking buttons or filling forms is resource-intensive, resulting in significant computational overhead and slower response times. This approach not only degrades the user experience but also limits the scalability of AI-driven processes. Browser simulations are vulnerable to dynamic content, pop-ups, and CAPTCHA challenges, which frequently interrupt automated processes. This susceptibility to errors leads to incomplete or failed tasks, reducing reliability.

API integrations, the alternative, aren’t much better. APIs are supposed to be the right way to connect with web services, but they’re all over the place. Different structures, different data formats, different rules for access. There’s no standard, so every integration feels like starting from scratch. It’s a mess of bespoke code and constant updates just to keep things running.

Worst of all, there’s no compensation for the content creators. AI pulls data from everywhere, but there’s no built-in way to give anything back. Writers, developers, and companies see their work used by AI without payment or recognition. And then there are the security and privacy issues. Scraping bypasses all the usual protections, and even APIs can have weak spots. When AI agents interact with web services, it often feels like the Wild West: chaotic, lawless, and risky.

The UIM Protocol

The Unified Intent Mediator (UIM) protocol aims to fix this. But it’s important to be clear: UIM is not a magic bullet. It’s an evolving concept, not a finished product. It’s an attempt to standardize how AI agents interact with web services, creating a framework that’s more reliable, secure, and fair.

UIM introduces the idea of “intents.” Intents are like little action cards that web services can offer up, defining specific tasks like searching for products, retrieving data, or placing orders. Each intent comes with metadata—details about what it does, what parameters it needs, and how it can be executed. This makes interactions consistent and straightforward.

There’s also a focus on security. UIM uses Policy Adherence Tokens (PATs), which are like digital permission slips. They encapsulate the rules, permissions, and billing for each interaction, automating compliance and reducing the overhead of manually managing access.

Discovery is another key part of UIM. Instead of AI agents guessing where to find things, UIM introduces methods like DNS TXT records and `agents.json` files to guide them. It’s like a map, pointing agents to the right APIs and making it easier to connect without guesswork or risk.

Why It Matters

If UIM works as intended, it could change everything. For one, it would standardize interactions between AI agents and web services. No more duct-taped scraping scripts or custom API integrations. AI could interact in a consistent, predictable way, making development faster and maintenance easier.

UIM also introduces the possibility of fair compensation models. Service providers could charge for access to their intents, creating a built-in way to get paid for the content and services AI agents use. This is a big deal because it aligns incentives: web services get paid, and AI agents get reliable access. Everyone wins.

Security would also improve. With standardized tokens and structured interactions, the whole process becomes safer. Web services know exactly who’s accessing what, and users’ data is better protected.

The Realities

But let’s not get carried away. The UIM protocol is still a concept, not a fully implemented system. For it to succeed, it will need widespread adoption, and that’s a big ask. Every web service has its own way of doing things, and getting them to agree on a new standard won’t be easy.

There are also technical challenges. Standardizing interactions sounds great, but the devil is in the details. UIM would need to work across a vast range of services, each with unique requirements and constraints. It’s a long road from a good idea to a working reality.

And then there’s the economic side. Monetization is a selling point of UIM, but it’s also a potential flashpoint. How much should services charge? Who sets the rates? Will developers push back against paying for access that used to be free? These are questions without easy answers.

A Step Forward

Despite the hurdles, UIM represents a meaningful step in the right direction. It’s an attempt to bring order to a chaotic space, to create a fairer, more secure, and more efficient system for AI-web interactions. It’s not a guarantee of success, but it’s a shot worth taking.

The current methods are broken, and it’s clear something needs to change. Whether UIM is the final answer or just the first draft of a better approach, it’s a conversation we need to have. Because as AI continues to grow, the way it interacts with the web will define the next era of digital services.