Your AI Comms Bill Is About to Look Very Different: What Law Firms Need to Ask Their Vendors

Posted on17 June 2026

For most law firms, AI spend started as a line in the innovation budget. Pilot licences, a transcription tool trialled by one practice group, a meeting capture product rolled out to the litigation team. By 2026 that has changed. AI features are live inside the platforms fee earners use every day: phone systems, Microsoft Teams, meeting rooms, contact centre software. The costs have moved from exploratory to operational, and a lot of firms are finding they weren’t budgeted for what AI actually costs once it runs at scale across a full matter load.

The Training/Inference Problem Nobody Explained

When law firms talk about AI costs, they tend to think about the upfront investment: platform fees, onboarding, training fee earners to use new tools. Those are real costs, but they’re largely fixed. The cost that compounds is inference, the compute that runs every time an AI feature does something. Every client call transcribed, every matter meeting summarised, every interaction routed through an AI-assisted intake workflow: that’s inference. It runs continuously, at volume, and it scales directly with how much your people use it.

Aragon Research estimates that by 2026, inference accounts for two-thirds of all AI computing power consumed by enterprises. Lenovo’s 2026 Total Cost of Ownership analysis found that for sustained AI inference workloads, on-premises infrastructure can reach cost breakeven against cloud providers in as little as four months. Those figures describe what happens when a platform that looked affordable during a pilot runs at production volume across a firm with 200 fee earners, multiple offices, and back-to-back client calls.

Most firms didn’t choose the wrong platform. The cost model was never properly explained to them, because vendors have little incentive to explain it.

What Happens When Usage Doubles

Here’s a practical test. Ask your communications vendor a single question: what happens to my costs when usage doubles?

A platform built with AI inference economics in mind will have a clear answer. The pricing model accounts for scale. The architecture doesn’t rely on passing every transcription request, every meeting summary, every AI-assisted call routing decision through an external cloud API at per-call rates that compound with adoption. The vendor can tell you, plainly, what your bill looks like when utilisation doubles across the firm.

A platform with AI bolted on (an existing product that added AI features as the market demanded them) will give you a murkier answer. The inference costs sit upstream, in a dependency on a third-party model provider, and the vendor is passing those costs through with a margin on top.

AudioCodes make this point clearly in their own analysis of inference economics, drawing on Aragon Research. Their Meeting Insights On-Prem product, which has a dedicated deployment for legal firms, is a useful illustration: meeting intelligence software designed from the ground up to run within a firm’s own infrastructure, specifically to avoid the runaway inference costs that come with cloud-dependent AI at scale. The architecture reflects a deliberate choice about where inference runs and who bears the cost. That’s what the vendor question is designed to reveal.

Why Law Firms Face a Harder Version of This Problem

For law firms, inference economics isn’t only a cost question. It’s a professional obligations question.

Legal professional privilege means that client communications, including calls, meeting transcripts and matter discussions, carry confidentiality obligations that don’t simply transfer to a cloud platform because a vendor’s terms of service say so. When AI inference runs externally, processing client call recordings and matter meeting summaries on infrastructure the firm doesn’t control, the firm’s ability to maintain privilege and satisfy its SRA obligations becomes materially harder to evidence.

Under UK GDPR and the Data Protection Act 2018, law firms are data controllers for their clients’ personal data. The Data (Use and Access) Act 2025, which came into force in February 2026, introduced further clarifications on automated processing that firms need to account for, particularly where AI tools are making or informing decisions based on client data. Where that processing happens, and under whose governance, is no longer a detail buried in a vendor contract. It’s a compliance question the SRA and the ICO are increasingly likely to ask.

Lexcel-accredited firms face a specific version of this: the standard’s information management requirements name a responsible owner for data governance. If that person can’t answer where AI inference runs, on whose infrastructure, and under what retention and access controls, the accreditation is harder to evidence than the register suggests. A related post covers the broader compliance picture: Shadow AI in Professional Services: Why Meeting Data Is Your Biggest Compliance Blind Spot.

The Portfolio Question, Not the Product Question

The right frame for this isn’t cloud versus on-premises. Most firms will end up with both, and that’s probably sensible. Cloud infrastructure suits variable, experimental, and capability-testing workloads. Predictable, high-volume, client-data-sensitive workloads of the kind that run continuously across a busy firm are increasingly better served by architecture that doesn’t meter every inference call back to a hyperscaler and doesn’t route privileged client conversations through external infrastructure unnecessarily.

A useful rule of thumb: when cloud AI costs approach 60-70% of what equivalent on-premises infrastructure would cost over the same period, an on-premises evaluation is worth running. For communications workloads running across every fee earner, every working day, that threshold arrives sooner than most technology budgets assume.

When a platform renewal comes up, whether that’s your voice and telephony infrastructure, your contact centre, or the AI layer sitting across both, the inference cost question belongs on the agenda alongside the feature comparison. Ask what the architecture looks like. Ask where the AI runs. Ask what your bill looks like at twice the current volume. The answers will tell you more about long-term cost and risk than any headline price or feature checklist.

Marlin has worked with law firms across the UK for over 25 years, including on the kind of communications infrastructure problems that surface when AI enters an existing technology estate. If you’d like to work through these questions in the context of your own firm’s setup, we’re a good place to start.