From Evidence to Illusion: When Generative AI Undermines Public Accountability

Nov 2, 2025 | AI in Qualitative Research, Blog | 0 comments

By Claire Moran

Introduction

In October, Deloitte agreed to issue a partial refund to the Australian government after serious flaws were uncovered in a $440,000 report evaluating the Department of Employment and Workplace Relations (DEWR)’s compliance framework. The report, originally published in July and quietly updated months later, had been found to contain numerous fabricated citations, non-existent academic references, and a made-up legal case. It was later revealed that parts of the report had been generated using Azure OpenAI’s GPT-4o model.

While the department and Deloitte insisted that the report’s recommendations remained valid, the controversy exposed significant risks in the uncritical use of generative AI in public sector evaluations. This case is not an anomaly. It is a cautionary tale for how the use of AI in evidence-making processes can erode contestability, transparency, and institutional trust.

The Mirage of Traceability

One of the foundational principles of research and evaluation is traceability: the ability to follow a claim back to its source, evaluate the reasoning, and contest the interpretation. The Deloitte review failed this test. As University of Sydney academic Dr. Christopher Rudge noted, its references were not simply inaccurate; they were hallucinated (Dhanji,2025), a known behaviour of large language models (LLMs) that produce fluent but fabricated outputs when uncertain.

In a field like public policy, where reports influence real-world outcomes and shape social narratives, this is not a benign error. It severs the link between evidence and accountability.

Generative Fluency, Epistemic Fragility

AI-generated text often appears authoritative, precisely because it mimics the surface features of scholarly prose: citations, hedging language, structured argumentation. But as Kalai et al. (2025) argue in their technical analysis of hallucination, language models are trained to guess, not to know. Their outputs are shaped by statistical plausibility, not evidentiary integrity.

The Deloitte case exemplifies what happens when institutional processes fail to account for this gap. If AI is used to generate citations, claims, or summaries without rigorous verification, the result is not accelerated insight but epistemic sleight-of-hand.

Contestability Under Threat

Contestability is a key safeguard in democratic and academic systems. It ensures that claims can be interrogated, challenged, and refined. Yet when AI-generated content is introduced into reports without clear disclosure or audit trails, contestability is weakened.

As explored in the University of Melbourne’s (2025) work on AI use among young people users often trust AI outputs more than they should, especially when the outputs are delivered with fluent confidence. This dynamic translates dangerously into institutional settings, where consultants and public agencies may be incentivised to appear efficient or cutting-edge, but lack robust AI literacy.

When Verification Becomes Retrofitting

In the Deloitte case, updating the report meant replacing fake references with other references, sometimes multiplying the citations without clarifying the evidentiary basis. As Rudge observed, this suggests the claims were not grounded in any identifiable source to begin with. Verification, in this context, became an act of retrofitting plausibility after publication.

This backwards rationalisation reveals a deeper problem: AI’s outputs are often treated as provisional drafts, yet can enter public discourse as final claims. Without methodological rigour and clear provenance, these outputs corrode the standards of public reasoning.

Toward Responsible AI-Supported Evaluation

To be clear, the problem is not the use of AI per se, but its uncritical integration into high-stakes processes. Responsible use of generative AI in research and evaluation demands:

  • Transparent disclosure of when and how AI is used;
  • Human verification of all factual claims and sources;
  • Methodological protocols that ensure traceability;
  • Ethical awareness of AI’s limitations and affordances.

Closing Reflection

The Deloitte report’s errors are not just technical glitches; they are epistemic failures with material consequences. In a time when generative AI is becoming embedded in research, consultancy, and policy making, we must defend the infrastructure of verification that makes knowledge contestable. That means resisting the illusion of fluency and recommitting to the principles that underwrite public accountability: traceability, transparency, and methodological rigour.

References

Dhanji, K. (2025, October 6). Deloitte to pay money back to Albanese government after using AI in $440,000 report. The Guardian. https://www.theguardian.com/australia-news/2025/oct/06/deloitte-to-pay-money-back-to-albanese-government-after-using-ai-in-440000-report

Kalai, A.T., Nachum, O., Vempala, S.S., & Zhang, E. (2025). Why Language Models Hallucinate. arXiv:2509.04664v1

Karp, P. (2025, October 13).  How one academic unravelled Deloitte’s AI errors. The Australian Financial Review. https://www.afr.com/politics/how-one-academic-unravelled-deloitte-s-ai-errors-20251013-p5n224

University of Melbourne. (2025). Young people are using AI coaches, but are they using the right ones? Pursuit. https://pursuit.unimelb.edu.au/articles/young-people-are-using-ai-coaches,-but-are-they-using-the-right-ones

Explore More Insights

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Get your free guide: Research Design Simplified!

Subscribe to our newsletter and receive Research Design Simplified: A Practical Guide to Ontology, Epistemology, and Methodology as a free gift.

You have Successfully Subscribed!