"We’ll build this ourselves,” and three other objections to Agent Analytics

If you're like most companies, you're under pressure to invest in and deploy great AI systems. And at this point, you've probably sunk quite a large bit of your budget into this so far.

But far too few people think about what happens next: proving worth, impact on employees, and value for shareholders.

93% of enterprise AI budget goes to the technology itself: infrastructure, models, deployment. The remaining 7% goes to understanding whether any of it is actually working. Many builders and leaders realized they’re missing the analytics layer to help make it all work.

Here’s what’s come up in conversation with these leaders, and why not everything is what it seems.

Objection 1: "We could just build agent measurement tools ourselves."

Of course you could. It's 2026. We know that. And "ourselves" doesn't even mean a team of engineers. A PM with Cursor and a free afternoon can vibe-code something that analyzes their agent conversations.

That’ll get you a working prototype, a cool demo, and a slow-burning quality issue you won't feel until you have a leaky bucket of non-returning users when it's time to make this scale.

We faced the same challenge internally: "When we first used an LLM to analyze agent conversations, the results looked great until we checked them. Our solution was confidently miscategorizing issues and drifting from its instructions in ways we didn’t predict,” Danielle Goh, Sr. Product Manager for Agent Analytics, said. “Getting to output we'd actually make a product decision around took months of iterating and improving our methodology."

You need to trust what you build, and that's where things get scary and expensive. Every time your homegrown classifier tells you "40% of users are asking about X" or "this use case is trending," you're doing the math in the back of your head. Is that real? Is it looking at the right user segment? Did classification drift when the model updated? Did we miss an entire category of new issues because we didn't know how to ask for it? Is this a genuine pattern or is the LLM confidently hallucinating a trend?

That's the actual risk of building your own AI measurement tool. And eventually, it becomes a business cost: the sprint to build it, the maintenance burden, and the hours you spend auditing your own tool's outputs instead of acting on them.

Before Ticketmaster's Brian Muehlenkamp had Agent Analytics, he was doing exactly this: manually reading through chat logs one by one, trying to determine whether an interaction went well or not. "It was very manual, very one by one," he said. "To the point where I just wasn't giving it the due diligence because the juice wasn't worth the squeeze."

Once he had reliable, structured data he could actually trust, he used it to identify a high-volume issue theme, improved the agent's knowledge base to address it, and drove a 53% reduction in rage prompt rate. This is a systemized, quality win you can't get from eyeballing logs, uploading static spreadsheets, and spending hours querying your data.

Focus your efforts on building the agent that wins you the market. We built the tool, proven, to help you do it.

Objection 2: "We already have an AI observability tool, like LangSmith or Arize."

Good. Keep using it. Dev observability tools are excellent at what they were built to do: traces, latency, evals, token costs, hallucination detection. If your agent goes down at 2 am, that's who your team will call. Those tools tell you whether the system is healthy, and that question matters.

But there's a second question nobody's observability stack was designed to answer: are my users actually getting value?

Those are not the same question. An agent can have perfect uptime, clean traces, and a 1.2 second response time while a user rephrases their question nine times, gives up in frustration, and never comes back. Your dev tool considers that interaction a completed session, whereas Agent Analytics flags it as an at-risk user experience.

Technical AI Observability

Is the system working?

Messages:26 (13 turns)

Avg Response Time:1.2s

Token Usage:Within limits

Errors:0

Status:Completed

VERDICT: SYSTEM HEALTHY

Focuses on technical performance, uptime, and infrastructure health.

VS

Product-level AI Observability

Is the experience driving value?

Task intent mismatch:High

User re-prompts:9 rephrases

Goal achieved:No

Task abandoned:Yes

Frustration signal:Critical

VERDICT: USER FAILED

Focuses on user behavior, outcomes, and business impact.

The other problem, again, is data isolation. Currently, most users don't experience agents as a standalone system. They navigate your traditional SaaS product, hit a friction point, open the agent, and either get what they need or bail. What happens across the full user journey tells you far more about whether your agent is working. When agent data lives in a separate observability tool, siloed from the rest of your product analytics, you lose that picture entirely. You can't connect agent usage to retention, conversion, or churn. The agent becomes a black box inside a black box.

Dev tools monitor your LLMs. Agent Analytics monitors what users experience, live in production. They're complementary, and the best teams use both. Here's how that looks in practice.

Objection 3: “We have PII and security concerns.”

Completely fair. Whether you're in a highly regulated industry or just recently got approval for your first big AI investment, the security hurdle can be a complex, long collection of red tape to navigate.

To make this hurdle easier for you, you can actually decide what level of information you send Agent Analytics via the Conversations API. Before anything reaches Pendo, your team can redact, transform, and sanitize on your end. The user prompt of, "Book travel to New York City on March 30 for John Smith" becomes "booking travel." You keep the intent signal, the adoption data, the retention correlation. The sensitive details don't go anywhere.

For most teams, this allows them to leverage Agent Analytics. If you or your legal teams need more details, visit trust.pendo.io and our data collection and security docs. Or, read how we collect agent data directly from Pendo’s Chief Information Security Officer.

Objection 4: "We only have a few users. It's too early."

This is the one that really gets to us, because it's so reasonable yet so backwards.

If you currently have five users on your agent, ask yourself: how are you deciding when you're ready to go from five to 20? What does that decision actually look like? What data will you use?

If the honest answer is "if one user tells us it's great" or "we get no complaints," that's blind optimism, not a legitimate product decision. And although important, optimism alone doesn't scale.

Let’s revisit our friend at Ticketmaster, Brian. He started with a pilot group of 30 to 35 power users, and he knew they'd like it because they were the ones he'd already demoed it to. But liking it and having the data to prove it are different things.

Agent Analytics gave him the metrics to confirm adoption was real, surface the emergent issues, and build the conviction to open access to anyone who wanted it. The result: 81.6% user retention among people who tried the agent, and roughly 60% user base growth beyond the initial pilot. "It gave me the confidence to roll to phase two much faster than I anticipated, because I had the metrics to support it," he described.

If you want to hear Brian walk through the whole thing — including how he uses Pendo's MCP (Model Context Protocol) server to run a continuous improvement loop — he's joining us live on May 19. Save your spot here.

Other PMs like Brian are also finding success with Agent Analytics: Pushpay found that users were abandoning agent conversations after just three or four prompts. Seeing exactly where drop-off was happening let their team redesign that specific experience based on real behavior, and users went from spending one to two minutes finding critical information down to ten seconds.

None of these improvements came from waiting until there were "enough" users. In fact, in all cases they started in very early beta, measuring early, catching friction before it became a pattern, and making quick calls before losing users' trust and adoption. The right time to start was at user one. But right now works too.

Pendo Agent Analytics is purpose-built for AI agent interactions in production: use case classification, issue detection, and impact on greater user and product outcomes. Everything thumbs-up/thumbs-down alone is never going to tell you.

See how teams are identifying failing agent experiences in minutes. View Agent Analytics in action

The 10 KPIs for building an effective digital workplace in 2025

5 signs you have a software experience problem

Pendo puts Pendo to the test, reducing SaaS spend by 30%

What to expect at PendomoniumX—and why you should attend

"We’ll build this ourselves,” and three other objections to Agent Analytics

Objection 1: "We could just build agent measurement tools ourselves."

Objection 2: "We already have an AI observability tool, like LangSmith or Arize."

Technical AI Observability

Product-level AI Observability

Objection 3: “We have PII and security concerns.”

Objection 4: "We only have a few users. It's too early."

You might also like

A three-step playbook to measuring the value of AI agents

Everybody’s talking about MCP: Lessons from builders at Miro, Atlassian, and beyond

5 ways to use Pendo MCP: Your product data, wherever you work

Get to know Pendo