There's a question that almost never comes up in AI vendor evaluations, and it should be the first one asked: Who benefits when this system learns from my firm's work?
The answer, for every major legal AI SaaS product on the market, is: everyone except you.
When your associates run queries through Harvey, CoCounsel, or Lexis+ Protégé, they aren't just getting answers. They're generating training signal. Every search pattern, every document uploaded, every correction made, every workflow completed feeds back into a system that improves — not for your firm specifically, but for the entire customer base. Including the firm across the street competing for the same client.
This isn't a conspiracy. It's a business model. And it's worth understanding clearly, because the incentive structure has consequences that most managing partners haven't fully considered.
How Does Shared Learning Work in Legal AI?
Harvey is now used by roughly 100,000 lawyers across Am Law firms. It has raised over $186 million and carries a valuation north of $1 billion. That valuation didn't materialize from the technology alone. It was built, in part, on something investors find irresistible: a growing corpus of proprietary usage data from the most sophisticated legal practices in the world.
Every time your litigation team uses Harvey to analyze a complex damages theory, that interaction — the query, the refinement, the feedback — becomes part of the system's improvement loop. Harvey gets marginally better for everyone. Your firm contributed the insight. A hundred thousand other lawyers share in the result.
Thomson Reuters followed the same logic when it acquired Casetext — and its CoCounsel product — for $650 million in 2023. Then in February 2026, it acquired Noetica. These aren't standalone purchases. They're pieces of an aggregation strategy built on the premise that the more client data flows through a centralized system, the more defensible the platform becomes.
LexisNexis Protégé processes queries through a 200-billion-document repository. Each query teaches the system something about what lawyers actually need — which arguments they search for, which authorities they trust, how they frame issues. That behavioral data is extraordinarily valuable. And it accrues entirely to LexisNexis.
What Is the Dilution Problem with SaaS AI?
The standard rebuttal is that shared learning benefits you too. Your vendor's model improves, and you benefit from those improvements. That's technically true, and functionally irrelevant.
Here's why. When your firm contributes a learning signal to a platform used by tens of thousands of lawyers, the improvement is distributed across the entire user base. The marginal benefit that returns to you is diluted to near-zero. You contributed a dollar of insight and received a fraction of a cent in return — along with every other firm on the platform.
This is the SaaS learning paradox: you pay for the product, you contribute to its improvement, and the improvement you see is negligible because it's been averaged across thousands of firms with different practice areas, different strategies, and different needs.
The vendor, meanwhile, captures 100% of the aggregated value. That's what investors are buying when they fund these companies at billion-dollar valuations. Not the technology. The data flywheel. Your data flywheel.
How Does Proprietary AI Change the Learning Dynamic?
Now consider the alternative structure. A firm deploys AI trained on its own work product, running on its own infrastructure, learning exclusively from its own interactions. Every query, every correction, every workflow refinement compounds — for that firm alone.
Your immigration team's search patterns become a strategic asset. Your M&A group's drafting preferences sharpen the system's output specifically for how they work. Your litigation associates' research habits train the model to anticipate their needs, not the needs of a generic lawyer at a generic firm.
Over twelve months, the difference is material. A proprietary system that has processed ten thousand interactions for a single firm develops a level of contextual intelligence that no shared platform can replicate — because shared platforms, by design, optimize for the average across all clients, not the specific needs of any one.
This is the compounding advantage: 100% of the learning accrues to you. Not 1/100,000th of the learning. All of it. Every interaction makes the system more valuable to your firm, and only to your firm.
Who Owns the Moat in Legal AI?
In competitive strategy, a moat is what makes a business defensible — what prevents competitors from replicating your advantage. In the current legal AI market, the moat belongs to the vendor. And the firms using the product are the ones building it.
Harvey's moat isn't its model architecture. Architectures are commoditizing fast. Its moat is the behavioral data from 100,000 lawyers that no competitor can replicate. Thomson Reuters' moat isn't its document library — it's the query patterns that reveal what that library is actually used for. Every time your firm uses these tools, you're deepening a competitive advantage that belongs to someone else.
The strategic question is whether you want to keep building someone else's moat, or start building your own.
A firm with proprietary AI owns its moat. The institutional knowledge embedded in the system — the patterns, the preferences, the accumulated intelligence of how that specific firm practices law — is an asset that appreciates over time and cannot be replicated by a competitor. It's not portable. It's not shared. It's yours.
Why Does This Matter Now?
The firms that recognize this dynamic early have an asymmetric opportunity. Proprietary AI is still relatively rare in legal. The firms that deploy it now — and begin accumulating firm-specific intelligence — will open a gap that widens with every passing month. Because the compounding works in both directions: your proprietary system gets better for you, while your contributions to shared platforms continue to benefit everyone equally.
The longer you wait, the more institutional knowledge you've donated to a platform you don't own, training a model you don't control, building a moat that protects someone else's business.
The data is already yours. The question is whether you're going to keep giving it away.
RAGbase Legal builds proprietary AI systems for law firms — infrastructure where every interaction compounds for your firm alone. If you're evaluating where your data goes when your lawyers use AI, we should talk.
Frequently Asked Questions
Does Harvey AI train on my firm's data?
What is the shared learning problem in legal AI?
How does proprietary AI protect my firm's competitive advantage?
What is a data moat in AI?
Related Articles
The True Cost of Legal AI: SaaS Subscriptions, Hidden Fees, and the Ownership Alternative
The hidden costs of legal AI in 2026 — SaaS subscription economics, the efficiency penalty on billable hours, data sovereignty risks, and why proprietary AI changes the math.
Heppner v. United States: Why Your Firm's AI Infrastructure Now Determines Privilege
The SDNY ruling that changes how every law firm should think about AI — Judge Rakoff held that documents generated using consumer AI chatbots are not protected by attorney-client privilege.
98% of AmLaw 200 Firms Use AI — But Most Still Can't Search Their Own Files
98% AI adoption, but most law firms still can't search their own institutional knowledge. The gap between external AI tools and internal document access — and how to close it.
RAGbase Legal builds proprietary AI systems for law firms — deployed on the firm's own infrastructure, zero data retention, full code ownership. 80+ enterprise deployments.
See How RAGbase Legal Works on Your Data
Free 3-5 day proof of concept. Your data, your infrastructure, working results.