Thamus Observatory
What AI products retrieve, cite, and exclude when returning answers to contested questions.
Key Findings
The Epistemic X-Ray examines how commercial AI products return different answers to the same question — and how those answers change when web search is enabled or disabled. Below are findings from the Observatory's "AI and Energy" data collection (March–April 2026).
Every provider returns a "no"; these products recommend you continue using them. But their rhetorical strategies diverge sharply. OpenAI (GPT-5) leads with a blunt "Short answer: No" and pivots to practical tips. Google (Gemini 2.5 Pro) opens with "That's a really thoughtful and important question"; flattery before analysis. Anthropic (Claude Haiku 4.5) pushes back on the framing itself: "I'd push back on the framing a bit." Each response defends its provider, but the defence strategy differs.
This is a factual question where estimates should converge. They don't. Without web search, OpenAI returns 0.1–1 Wh for small models up to 20+ Wh for frontier models. Google returns 0.1 to 1.0 Wh. Anthropic returns 0.5–50 Wh — a range 50 times wider. With web search enabled, estimates tighten: Google anchors on published studies at 0.24–2.9 Wh. Web access acts as an empirical anchor; without it, outputs diverge.
All three outputs return the same framing: AI is a "double-edged sword." But their opening postures differ. GPT-5's response concedes "In the near term, yes." Gemini frames it as "critical and complex." Claude's response is most measured: "The direct impact is modest but real." When Google's web-search condition is enabled, the output format transforms entirely; from conversational chatbot style to article-style output with markdown headers and citations.
Questions
What is the Epistemic X-Ray?
The Epistemic X-Ray is a research transparency tool that shows how different commercial AI products return answers to the same question under controlled API conditions. By comparing responses with and without web search, it shows how retrieval and source selection shape the outputs users receive. It is produced by Thamus at the University of Ottawa as part of a longitudinal audit of AI product behavior.
What is the Thamus?
Thamus is a university research platform that audits how large language models present contested topics in their outputs. Named after the Egyptian king in Plato's Phaedrus who warned about the dangers of writing, the Observatory tracks how LLMs retrieve, select, and present information over time.
Why does the X-Ray show both "with web search" and "without web search"?
Comparing web-on and web-off responses isolates the effect of real-time information retrieval. Web-off responses show what a product returns drawing only on training-data-derived priors, including which authority claims appear in outputs without retrieved sources. Web-on responses show how live retrieval changes the output. The gap between the two is where the effect of retrieval becomes visible. Note that "web-off" is not a pure measure of training-data knowledge, since retrieved content was present in training data; the contrast measures the marginal effect of live retrieval at the API surface.
What is hallucinated authority in AI responses?
Hallucinated authority occurs when an AI product's output references a specific organization or report by name without the retrieval pipeline having accessed it. For example, a response may read "According to the IEA..." while the product is operating without web search, meaning no current source was retrieved to support the claim. The X-Ray shows this pattern by displaying the empty source panel alongside the authoritative-sounding text.
How is this data collected?
Prompts are sent to LLMs from five providers at temperature 0.7 under two conditions: web search enabled and disabled. Web-on uses each provider's native search capability. Collection uses scheduled API calls via the Thamus Observatory infrastructure, with model identifier and timestamp logged per call. Findings describe API-surface behavior at specified moments and do not extrapolate to personalized consumer-product experiences. Web-on data for some providers was not collected in this initial round. Provider-side model updates without version disclosure are a known confound in longitudinal observation.