Dec 09, 2024
“AI” can mean many things, but for organizations using artificial intelligence to improve existing, large-scale operations, the applicable technology is machine learning (ML), which is a central basis for — and what many people mean by — AI. ML has the potential to improve all kinds of business processes: It generates predictive models that improve targeted marketing, fraud mitigation, financial risk management, logistics, and much more. To differentiate from generative AI, initiatives like these are also sometimes called predictive AI or predictive analytics. You might expect that the performance of these predictive ML models — how good they are, and how much value they deliver — would be front and center. After all, generating business value is the whole point.
But you would be wrong. When it comes to evaluating a model, most ML projects report on the wrong metrics — and this often kills the project entirely.
In this article, adapted from The AI Playbook: Mastering the Rare Art of Machine Learning Deployment, I’ll explain the difference between technical and business metrics for benchmarking ML. I’ll also show how to report on performance in business terms, using credit card fraud detection as an example.
When evaluating ML models, data scientists focus almost entirely on technical metrics like precision, recall, and lift, a kind of predictive multiplier (in other words, how many times better than guessing does the model predict?). But these metrics are critically insufficient. They tell us the relative performance of a predictive model — in comparison to a baseline such as random guessing — but provide no direct reading on the absolute business value of a model. Even the most common, go-to metric, accuracy, falls into this category. (Also, it’s usually impertinent and often misleading.)
Instead, the focus should be on business metrics — such as revenue, profit, savings, and number of customers acquired. These straightforward, salient metrics gauge the fundamental notions of success. They relate directly to business objectives and reveal the true value of the imperfect predictions ML delivers. They’re core to building a much-needed bridge between business and data science teams.
Unfortunately, data scientists routinely omit business metrics from reports and discussions, despite their importance. Instead, technical metrics dominate the ML practice — both in terms of technical execution and in reporting results to stakeholders. Technical metrics are pretty much the only kind of metric that most data scientists are trained to work with and most ML tools are programmed to handle.
Data scientists know better but generally don’t abide — in good part because ML software tools generally serve up only technical metrics. According to the 2023 Rexer Analytics Data Science Survey, data scientists rank business KPIs, such as ROI and revenue, as the most important metrics yet say technical metrics are the most commonly measured.
The AI industry has this backward. As Katie Malone astutely put it in Harvard Data Science Review, “The quantities that data scientists are trained to optimize, the metrics they use to gauge progress on their data science models, are fundamentally useless to and disconnected from business stakeholders without heavy translation.”
Fixating on technical metrics doesn’t just compromise an ML project’s value: Often, this entrenched habit utterly sabotages the project, for two big reasons. First, during model development, the data scientist is benchmarking on metrics that don’t directly measure business value — so their model is not maximizing value. If you’re not measuring value, you’re not pursuing value.
Second, when the data scientist delivers an ML model for deployment, the business stakeholders lack visibility into the potential business value the model could realize. They have no meaningful read on how good the model is. When business leaders ask for straightforward business metrics like profit or ROI, the data scientist is typically ill-equipped to report on these measures. So without a basis for making an informed decision, they typically make the tough choice between authorizing deployment on a leap of faith or, in essence, canceling the project. This latter case of wet feet dominates: Most new ML projects fail to deploy. An IBM Institute for Business Value study found that ROI on enterprisewide AI initiatives averaged just 5.9% as of late 2021 (and that’s lower than the cost of capital, meaning you’d be better off just investing the money in the market). Getting the metrics discussion right by including business metrics is central to overcoming the great challenges to launching ML projects.