The government is using ChatGPT to hunt $100 billion in Medicaid waste. The tool's known weakness is the catch.

Key takeaways

HHS's AERO program (Audit Enforcement and Risk Oversight), announced May 21, 2026, uses ChatGPT and other AI to scan at least five years of audit history across all 50 states for any entity spending more than $1 million a year in federal funds.
The target is an estimated $100 billion to $200 billion in annual waste and fraud; consequences include loss of federal funding, and HHS notified all 50 governors and treasurers.
Triaging large volumes of already-public audit documents is a strong fit for AI as a first-pass reviewer whose output a human then verifies.
The risk is the model's documented habit of confident error: an AI mistake that helps justify a funding cut is a different order of problem than an error in a chatbot.
Critics argue enforcement has disproportionately targeted Democrat-led states; the administration says AI lets it finally work through a backlog of audits that used to go unread.

The federal government is now using ChatGPT to look for fraud. In a program announced on May 21, the Department of Health and Human Services began running state and grantee audit reports through ChatGPT and other AI tools across all 50 states, chasing an estimated $100 billion to $200 billion in wasteful or fraudulent spending a year, by the estimate of the official leading it. As a use of AI, this is one of the more sensible ones in government. As a basis for decisions that can cut a state’s funding, it inherits the one weakness everyone already knows the tool has.

What the program actually does

The initiative is called AERO, for Audit Enforcement and Risk Oversight, and it is led by HHS assistant secretary for financial resources Gustav Chiarello. It uses AI tools to scan at least five years of audit history across all 50 states on a rolling basis, for any entity spending more than $1 million a year in federal funds. The program was first reported by The Wall Street Journal and uses ChatGPT and other AI tools to analyze the audit reports, and HHS sent letters to all 50 state governors and treasurers putting them on notice. The targets are chronic noncompliance, repeat deficiencies, and delinquent audits, and the penalty is loss of funding.

The framing from officials is that audits used to pile up unread. As Chiarello put it, the reports would land with a thud and no one did anything, and AI lets them dig into the backlog. It marks a shift from the old pay-and-chase model, paying first and clawing back later, toward flagging problems earlier.

Why the use case is reasonable, and where it isn’t

There is a version of this that is genuinely well-matched to what AI does. The system is reading large volumes of already-public audit documents to surface patterns a human backlog would never reach, which is close to the ideal task for a language model: a fast first pass over text that a person then verifies. Finding which of thousands of audits deserve a human’s attention is exactly the kind of triage these tools do well.

The catch is what happens after the flag. Critics note that the same tools frequently make mistakes and can carry unintended biases, and should be used with caution, and the stakes here are not a draft email. They are a state’s federal funding. The administration has already withheld hundreds of millions from Minnesota and more than $1 billion from California, and critics argue enforcement has disproportionately targeted Democrat-led states. Whatever one makes of that dispute, a confident AI error that helps justify a funding cut is a different order of problem than a confident AI error in a chatbot.

The precedent worth watching

AERO does not sit alone. The same government now forcing frontier models offline over safety concerns is also building AI into how it polices public money. The program builds on a February request for comment on broader fraud-detection rules, and the enforcement machine around it is already large: CMS cited a 2025 baseline of $5.7 billion in Medicare payments suspended and thousands of billing privileges revoked. What makes this one notable is not the dollar figure but the method. A federal agency has put a known-fallible language model into the workflow that decides which states get scrutinized and, downstream, which get paid.

The right guard is the same one emerging across every serious institutional use of these tools: treat the AI as a first-pass reviewer whose output a human verifies before it carries a consequence, never as the authority that delivers the consequence itself. Used that way, AERO is a reasonable answer to a real backlog. Used the other way, it turns a model’s most documented flaw into a lever over public money. Which version this becomes is the part to watch, because the program is rolling, not a one-time scan.

The Counter Brief — one email, every Monday.

The week's AI-for-revenue moves in a 5-minute read: which tools are worth the budget and which to skip, plus what to do this week. Source-checked, no vendor decks.

Edited by Aditya Marin Gasga

Free. One click to unsubscribe.

Frequently asked questions

Is the US government really using ChatGPT to find Medicaid fraud?

Yes. In a program called AERO, announced May 21, 2026, HHS began using ChatGPT and other AI tools to analyze at least five years of state and grantee audit reports across all 50 states, targeting an estimated $100 billion to $200 billion in annual waste and fraud. The program was first reported by The Wall Street Journal and uses ChatGPT and other AI tools to analyze the reports.

What happens if the AI flags a state?

The program targets chronic noncompliance and unresolved audit deficiencies, and consequences can include loss of federal funding. HHS notified all 50 state governors and treasurers, and the administration has already withheld funds from some states.

Is using AI for this appropriate?

It depends on how it is used. Triaging large volumes of already-public audit documents is a reasonable fit for AI as a first-pass review. The risk is relying on AI output, which can contain confident errors and biases, as the basis for funding decisions without human verification.

What is the broader context?

AERO builds on earlier 2026 fraud-detection efforts and a large existing enforcement effort, including a 2025 baseline of $5.7 billion in suspended Medicare payments. Critics have also argued that recent funding enforcement has disproportionately affected Democrat-led states.

About Aditya Marin Gasga

Founding Editor

Aditya Marin Gasga is the founding editor of The Counter Brief and Head of Growth at Demand Nexus, its parent company, where he works on sourcing qualified pipeline across SDR, content, and paid channels. His background is in performance marketing and demand generation. He studied business administration at Northumbria University.

The government is using ChatGPT to hunt $100 billion in Medicaid waste. The tool's known weakness is the catch.

What the program actually does

Why the use case is reasonable, and where it isn’t

The precedent worth watching

The Counter Brief — one email, every Monday.

Frequently asked questions

About Aditya Marin Gasga

Keep reading

Washington told states to stand down on AI. Six months later, they are writing the rulebook anyway.

Anthropic is obeying the order and fighting it at once. The precedent is the real stake.

The US government forced Anthropic to pull Fable 5 and Mythos 5. It is the first takedown of its kind.