A three-step playbook to measuring the value of AI agents

AI agents are being deployed across your product, engineering, IT, and customer success teams. Great. But do you know if they’re helping?

While this may sound silly, it’s a big question folks are getting asked everywhere. After a few years, people want to know: what’s the payoff? How are we better?

To help you answer this question, we talked to one of our own experts in the field, Pierce Healy. He co-founded Zelta.ai, and is now Pendo’s Sr. Director of AI Products. Pierce is pretty much our resident AI expert, helping our customers and coworkers (including me) understand and use AI better.

Here’s what he had to say about this topic.

Step 1: Pick your north star metric

The most common mistake teams make when measuring AI value is benchmarking against human effort or time saved. It’s totally normal to think, "this would have taken a human 5 hours, so we saved 5 hours."

But in many cases, the agent is doing something a human couldn't have done at all, like building a prototype in a language no one on the team knows, or extracting themes from thousands of support tickets and identifying instant improvements to resolve them. “The baseline is effectively infinite, which makes the math meaningless,” Pierce explained.

Instead, you should start with the outcome your team already cares about and measure that. This varies by team and your agent’s job to be done:

Customer success cares about ticket resolution rate, time-to-resolution, and escalation volume.
Engineering teams care about velocity of new features shipped, time from spec to production.
IT teams may track speed of provisioning, reduction in manual requests, changes in tech debt and spend, and what software agents could displace.
Sales needs to track upsell conversion rate or lead scoring accuracy.

Pierce’s advice: Whatever team you’re on, pick one metric. Make sure it's something you can track before and after the agent is deployed, and if you can't define what "better" looks like, you're not ready to measure ROI.

Step 2: Tag your agent and set up reporting

Now, you need to answer the question: “Can a dashboard or report show me that the agent is moving the metric I defined in Step 1?”

Most teams skip this step, missing visibility into how the agent is producing (or failing to produce) the outcomes you picked in step 1. Good agent instrumentation should look like:

Behavioral signals over self-reported evals. Evals, aka scoring an agent's output against expected answers before it’s live, are useful for engineering to align on what "correct" looks like, but they're easy to game, subjective at scale and not based on what actually happens in the wild. What you actually want to see is downstream behavior change. Are fewer tickets being escalated to Tier 2? Are engineers fixing fewer bugs? That's what to look for.
Set up your measurement tools. While most AI models include some native tracing and logging out of the box, they don't provide full visibility into how those agents are being used. To understand effectiveness, you need full visibility into how people are interacting with the agent: what they asked, where the conversation broke down, and what they did next. Agent Analytics is the only tool that can surface that behavioral layer in your agent ecosystem.
Track agent scope alongside outcomes. For background agents (the kind running workflows without a user choosing to "use" them), adoption looks different. Instead of measuring how many users clicked into it, measure whether the agent's scope is expanding or contracting based on performance and if it's being trusted to act more autonomously over time.

Step 3: Translate metrics into what leadership really cares about

You have your north star metric and instrumentation. Now, it’s time to create your business case.

For product leaders: ROI is less about cost, and more about strategic validity. Do users find this helpful? Is it a more efficient way to achieve your goals? Is this capability keeping us relevant for the next 3–5 years? Are we unlocking new market segments or use cases we couldn't serve before? Frame agent value as market positioning, not headcount reduction.
For IT and operations leaders: They want to know about efficiency. Tie your agent's performance directly to service-level improvements, like resolution rates, time-to-provision, and reduction in manual steps.
For executives and the board: Frame this around business outcomes, like: "our customer escalation rate dropped 22% in Q2." The agent is the how in this situation.

As AI agents become even more commonplace inside organizations, leadership will continue to need a way to manage them the way they manage human teams: seeing what each agent is doing, what access it has, and whether its scope should expand or contract based on performance.

Think of it as agent management: the same way a manager reviews a new hire's work before giving them more autonomy, you'll want visibility into each agent's effectiveness before widening its permissions.

Ready to see what your agents are actually doing? Get a demo of Agent Analytics.

The 10 KPIs for building an effective digital workplace in 2025

5 signs you have a software experience problem

Pendo puts Pendo to the test, reducing SaaS spend by 30%

What to expect at PendomoniumX—and why you should attend

Step 1: Pick your north star metric

Step 2: Tag your agent and set up reporting

Step 3: Translate metrics into what leadership really cares about

You might also like

"We’ll build this ourselves,” and three other objections to Agent Analytics

Everybody’s talking about MCP: Lessons from builders at Miro, Atlassian, and beyond

5 ways to use Pendo MCP: Your product data, wherever you work

Get to know Pendo