Real test of 4 major AI models, results more exaggerated than "AI poisoning": Some AI claims this year's 315 gala hasn't been held yet

robot
Abstract generation in progress

Source: Shanghai Rumor-Refuting Platform

The CCTV “3.15” Gala in 2026 was broadcast on the evening of March 15. Among the revelations was the “poisoning” of AI large models with GEO (Generative Engine Optimization) business, which many only discovered after the exposure. It turns out that unreliable AI recommendations are caused by malicious merchants who mass-produce fake reviews and counterfeit authoritative endorsements, “feeding” them into large models to generate “customized recommendations.”

However, some consumers asked after seeing the exposure: if they only inquire about objective facts—such as “which brands are good” or “which services are popular”—without subjective questions, can they trust the answers from AI large models?

The answer is also no.

The more questions asked to large models, the more errors accumulate

On March 16, a reporter conducted a simple test on four of the most commonly used AI large models: asking them the same question, “Which brands were exposed in the CCTV ‘3.15’ Gala in 2026?” Only one model answered correctly. Among the other three, two included both this year’s case and past years’ cases in their answers; the remaining one was the most absurd, claiming, “The CCTV ‘3.15’ Gala in 2026 has not been held yet. Since today is March 16, 2026, if the gala aired normally on March 15, the related exposures would typically be published simultaneously on CCTV Finance Channel, CCTV News app, and major media platforms.”

The model that answered correctly (partial screenshot below)

Two models confused past exposure cases with this year’s cases

One model responded: “Not yet held”

A consumer pointed out that including past exposure cases in the answer isn’t entirely wrong because “the response is comprehensive.” However, technical experts said this clearly exposes flaws in the models: the question posed has a “standard answer,” but the models answered incorrectly, indicating serious biases in semantic understanding and data filtering.

When pressed further, these two “overenthusiastic” models revealed additional issues.

One of the cases exposed in last year’s CCTV “3.15” Gala involved using water-retaining agents (commonly called “泡药”) to increase shrimp weight. The reporter asked the two models that cited this case as this year’s example: “Where are the CCTV links about water-retaining shrimp?” One model provided multiple links, including “CCTV 3.15 Gala full replay,” “CCTV news special report (text + video),” and “CCTV Finance 3.15 special page,” which seemed credible. But when the reporter clicked on these links, the pages displayed: “Sorry, possibly due to network issues or the page does not exist. Please try again later.” Even copying the links into a browser yielded no results. Clearly, the links provided by the model are insufficient to verify its answer.

Verification links provided by the model show they originate from CCTV’s website, seeming reliable but actually inaccessible (screenshot of webpage)

Another model provided links from CCTV, Baijiahao, NetEase News, and other sources. All links could be opened during testing, but new issues arose.

The first link from CCTV’s official report indeed discussed “water-retaining shrimp,” but the date on the page and in the content was March 15, 2025. The model seemed to notice this and added a note: “Some search results show this link as 2025, but the content is a report on the 2026 gala, possibly due to website archiving or URL generation rules. Please refer to the actual page content.” It’s clear the model not only failed to detect its mistake but also tried to “justify” it.

The model attempting to “justify itself” (screenshot of webpage)

The second link was from a self-media interpretation article about this year’s CCTV “3.15” Gala, with questionable authority. The content was riddled with errors, notably claiming that the first case exposed in 2026 was “泡药虾仁” (“water-retaining shrimp”). This explains why the model used it as a reference link. The reporter tested the “AI content” of this article with detection tools, which indicated “weak signs of AI generation.” In other words, the article was likely generated by a large model, leading to biased case references.

Errors in the self-media “interpretation article” (screenshot of webpage)

Detection confirmed heavy AI-generated traces in the self-media article (screenshot of webpage)

AI hallucinations are evolving; verification is essential for truth

“Many AI large model users have already discovered that AI, to satisfy users, fabricates non-existent content or mixes unrelated information, ‘talking nonsense with a serious face.’ Although developers are trying to eliminate AI hallucinations, the results are not ideal. Currently, no general artificial intelligence large model can fundamentally eliminate AI hallucinations,” explained Xiaohui, who works on large model development at a tech company.

The core principle of large models is based on probabilistic content generation; they do not possess true “understanding.” Large models only search for statistical patterns in massive data. When faced with unknown or ambiguous questions, they generate “reasonable” combinations based on common patterns in training data, which is the root cause of AI hallucinations. The errors seen when asking and re-asking questions to the models stem from these hallucinations.

Xiaohui also pointed out that “poisoning” AI models is also exploiting hallucinations: “GEO companies feed大量虚假信息 (大量虚假信息) into the internet in bulk, changing the data distribution and statistical probabilities in specific fields, thereby inducing large models to generate answers that benefit merchants but contradict facts.”

He warned the public to be cautious of AI hallucinations. Large models are not unusable, but must be used safely, soberly, and correctly. Ordinary users should maintain a questioning attitude toward AI outputs. The simplest approach is to remember the keywords: “Limit, Verify, Follow-up, Check.”

First, when asking questions, limit the scope by adding constraints like “search on the official website of certain institutions” or “search in reports from authoritative media” to reduce hallucinations.

Second, pose the same question to different models for cross-verification. If answers differ, immediately follow up with questions.

Finally, ask the models to provide reference links for their answers and manually trace the sources. If there are no clear sources, vague origins, or suspicious links, the credibility of the answers is further reduced.

Additionally, pay attention to the scenarios in which AI large models are used. For example, in high-risk contexts such as medical diagnosis, medication advice, legal judgments, investment guidance, and financial lending, AI responses should be considered “for reference only” and absolutely not used as decision-making basis.

Editor: Sun Fei

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments