Javascript is disabled in your browser

Startbase requires Javascript in order to display content.
Please check your browser settings and enable Javascript-Support for this website.

Reporting on (generative) AI

AI lies!!!1 or what does the study by the European Broadcasting Council want to tell us?

A new EBU/BBC study reports a 45%-igen error rate and significant problems with AI-generated answers to everyday questions. However, the free versions of the assistants were tested, i.e. without Pro models or activated web search. This explains some of the "outdated facts" and lack of evidence - but is not questioned in the media coverage.

Meinungsartikel by

Jan Hendrik Reichenbacher · Berlin, 28. October 2025

AI lies!!!1 or what does the study by the European Broadcasting Council want to tell us?

Key findings of the European Broadcasting Union (EBU) study

45% of the 3,000 responses tested contained at least one significant error (81% for "any" problem). The biggest source of error is "sourcing": in 31% of cases, sources were incorrect, missing or unverifiable.
Gemini stood out in particular when it came to sources (72% significant sourcing problems); the other assistants were each below 25%.
ChatGPT, Copilot, Gemini and Perplexity were tested - in the free default settings and generated at the end of May/beginning of June 2025. This reflects the"standard user experience" (#sic), not the "best case" with Pro models, RAG or explicit browsing.

The accompanying taxonomy ("toolkit") goes into more detail about the test setup and the errors identified: from "out-of-date information" and "out-of-date sources" to missing contextual information and hallucinated links.

Great media attention without critical scrutiny

German-language reports on the study in Tagesschau, ZDF, Deutschlandfunk, SWR, (etc.) questioned the study in a sometimes exaggerated manner ("45% wrong", "AI lies") and a downright spiteful undertone could be heard. During my research, I could not find any questioning of the experimental design - instead, the quotes from the media representatives sound more as if they are happy about the academic tailwind for their own interests:

"We see our line supported by the study: Trustworthy content needs trustworthy AI systems. ARD has an explicit interest in ensuring that our content is also accessible to users via AI. We are therefore actively seeking discussions with the platforms in order to arrive at good cooperation models that enable regulated access to our content. Mechanisms are needed to prevent false and misleading information."
Florian Hager, ARD Chairman and Director General of hr

More disinformation than knowledge gain?

The study deliberately focuses on the freely available versions of the AI assistants in order to reflect the "most frequent use". This may sound methodologically legitimate and also explains why outdated facts and weak sources appear so often:

No guaranteed real-time web search in the free versions, which increases the risk of encountering "Outdated information".

The EBU study and almost all media referencing it ignore the fact that it is an economic decision of AI providers not to offer some functions for free because they require significant resources and therefore costs.
In my opinion, this is a missed opportunity to promote digital literacy among readers. AI users should know that AI models are trained with an information snapshot. This "brain" therefore has a kind of publication date, so the AI cannot know anything that happens after this date.
This technical issue was the biggest point of criticism with the emergence of ChatGPT (2022) and therefore OpenAI, as well as other providers such as Anthropic and Google, worked on enriching this basic knowledge of the "brain" with current knowledge. This works, for example, via the additional option "web search", in which the AI automatically analyzes current search results for the user query and uses this "fresh" knowledge when generating the answer. Another technique in this context is RAG systems, in which the user can expand the "basic knowledge" with selected documents and specialized knowledge.
To exaggerate, the study creators could also complain about the fact that there is no up-to-date news on the year 2025 in the October 28, 2024 edition of the Spiegel.

Missing or inconsistent citations in the responses ("31% missing sources").

This criticism is also related to the reasons outlined above. The demand for consistent and correct citation of sources at all times ignores the technical nature of the (current) generative AI generation. This criticism suggests that an AI should actually work in the same way as a human:"Do research and then create a report from it with citations." .
This is exactly what the AI does when the web search function is used. However, if the AI is only supposed to answer with its "basic knowledge", i.e. the knowledge from the AI's training that was gathered on a specific date, then the AI cannot reproduce where which knowledge was learned. Neither could a human, or do you remember when and where you learned that the first German Chancellor was called Konrad Adenauer? In humans, knowledge consists of our brain synapses, in AIs it consists of word vectors - the miracle of this technology is that despite this physical difference, very similar results can be achieved.
To put it bluntly: should an AI be able to back up every statement with a source? If it ever could, wouldn't human products become completely superfluous, because we would no longer be able to write down our "learned knowledge" in future without having to look for a source for every fact, no matter how small. The bottom line is that this criticism of the EBU highlights an elementary philosophical problem of our time: What is the truth and who determines what the truth is?

Language/country bias: In the case of non-English-language answers, there were even fewer references.

In my opinion, two technical backgrounds play a role here:

UX decision: The AI speaks all the world's languages, so its knowledge is fed from all languages. But if the user speaks French, can the AI provide Spanish or German-language references with a clear conscience? Or would the human then feel cheated because they can't read these sources? It could therefore be a design / UX decision by the AI operator to only display sources in the user's language.
If web search was used: With the (free) tools used, there is certainly the possibility that a web search is performed - but it is just not guaranteed. This web search is based on a search term that the AI then enters for the user in a search engine such as Google or Bing in order to read the search results and evaluate them for the user. And this is where an external bias comes in: After all, Google queries require the keyword to be entered in a specific language. If the AI now decides to use an English search term, Google will primarily cite English sources. In addition, there are other details that can change the search results (location etc.).

"AI lies" or "digital literacy is important" - what is the true finding of the EBU study?

I heard about the study in the "Apocalypse & Filter Coffee" podcast and was irritated by how quickly the narrative "AI lies in 45% of cases" was spread. Such exaggerations serve prejudices instead of digital literacy, and that is exactly what keeps slowing us down in Europe.

Instead of maliciously reporting on the supposed shortcomings of a technology, I would like to see more media attention paid to digital literacy. Then Markus Feldenkirchen and Yasmine M'Barek, who I otherwise hold in high esteem, would realize for themselves that we are talking about user errors and not about the "systemic inability of a technology".