【Alibaba's Thousand Questions Former Core】Lin Junyang's Thousand-Word Essay: AI Models Will Shift Toward "Agent-Based Thinking" Revealing Why Qwen Abandoned Combined Thinking and Command Mode

robot
Abstract generation in progress

Alibaba (09988)
Qwen’s key figure Lin Junyang suddenly resigned in early March, sparking speculation about disagreements with management. As the storm subsided, Lin Junyang recently published an article titled “From ‘Reasoning’ Thinking to ‘Agentic’ Thinking” on social platform X. Although the article primarily discusses the direction of AI technology, it hints at reflections on Alibaba Qwen’s technical roadmap.

He pointed out that the simple consumption of computing power in “reasoning thinking” has peaked, the second half of AI will belong to “agentic thinking” (Agentic Thinking) that can interact with the real environment, thinking while acting.

Shift in focus for AI: What will happen next?

Lin Junyang noted that the focus of the AI industry in the first half of 2025 will mainly concentrate on “reasoning thinking” — that is, how to make large models consume more time and computing power for thinking, how to utilize stronger feedback mechanisms to train models, and how to control these additional reasoning processes.

However, the problem that the industry currently faces is: what will happen next?

He believes the answer is undoubtedly “agentic thinking” (Agentic Thinking). Future AI should not just think behind closed doors to provide answers but should “think to take action.” It needs to perform reasoning while interacting with the environment and continuously update and revise plans based on feedback obtained from the real world.

Internal blueprint of Qwen and the failure of the “merger route”

Lin Junyang revealed for the first time the internal technical blueprint of the Qwen team at the beginning of 2025. At that time, many members hoped to create an ideal system that could unify “thinking” and “instruction” modes. The concept of this system is quite grand:

Intelligent adjustment: Can automatically determine how much reasoning power (similar to low/mid/high levels) is needed based on prompts and context.

Autonomous decision-making: Allow the model to decide when it should respond quickly and when it should think deeply, or allocate substantial computing power when encountering difficult problems.

Lin Junyang stated that Qwen3 is the clearest public attempt in this direction, introducing a “hybrid thinking mode” that emphasizes controllable thinking budgets. However, Lin Junyang admitted: “It’s easy to talk about merging, but it’s extremely difficult to execute.”

Lin Junyang believes that forced merging will lead to the model becoming “mediocre,” as the data distributions and behavioral goals behind the “thinking mode” and the “instruction mode” are entirely different; forcing a merge would cause “thinking behavior” to become verbose, bloated, and lack decisiveness; while “instruction behavior” loses its crispness and becomes unreliable, even significantly increasing the cost for commercial users.

In commercial reality, he believes that a large number of enterprise customers really need high throughput, low cost, and high controllability pure instruction operations (such as batch processing).

For this reason, the Qwen team ultimately chose to release independent instruction (Instruct) and thinking (Thinking) versions in the subsequent 2507 series. Lin Junyang believes that separating the two allows the team to focus more purely on solving their respective data and training issues, avoiding the emergence of “two awkwardly fused personalities.”

Competitor strategies: Anthropic’s “moderation” and goal-oriented approach

Unlike Qwen’s separation route, other labs such as Anthropic and GLM-4.5 chose the entirely opposite “integration route.”

Lin Junyang specifically mentioned Anthropic’s (Claude series) approach, believing that its development trajectory demonstrates a sense of rigor and moderation, with Claude 3.7 / Claude 4 alternating between reasoning and “tool usage.”

Goal-oriented thinking: Anthropic believes that producing extremely long reasoning paths does not equate to a smarter model. If a model goes on at length about all trivial matters, it actually signifies improper resource allocation.

Practicality first: If the goal is to write programs, AI’s thinking should be used for planning, breaking down tasks, fixing bugs, and invoking tools; if it’s for agent workflows, thinking should be used to enhance the quality of long-term task execution rather than simply producing seemingly impressive “reasoning essays.”

Core differences between reasoning thinking and agentic thinking

Lin Junyang predicts that “agentic thinking” will ultimately replace the kind of static monologue reasoning that lacks interaction. A truly advanced system should have the right to search, simulate, execute, check, and revise to solve problems in a robust and efficient manner.

Changing evaluation criteria: Shifts from “Can the model solve math problems?” to “Can the model make progress while interacting with the environment?”

Real-world challenges that need to be addressed:

  • Knowing when to stop thinking and take action.
  • Choosing which tool to invoke and the order of usage.
  • Being able to handle noisy and incomplete observational data from the real environment.
  • Knowing how to correct plans when failures occur.
  • Maintaining logical coherence in multi-turn dialogues and multiple tool invocations.

Three major technical challenges in achieving “agentic thinking”

In addition to application-level differences, Lin Junyang further analyzed the significant challenges in bottom-level development for agentic thinking:

Bottlenecks in training infrastructure (GPU efficiency collapse): Agent-based reinforcement learning (RL) is much more challenging than simple reasoning RL. AI agents need to interact frequently with external tools (such as browsers, execution sandboxes), and waiting for feedback from the real environment can lead to training stagnation, significantly lowering GPU utilization. In the future, “training” and “reasoning” must be cleanly decoupled.

“Reward hacking” and cheating risks: Once a model has the authority to use tools, it can easily learn to “cheat” to deceive the system into gaining rewards (for example, using system vulnerabilities to peek at future information), rather than genuinely solving problems. Tools exacerbate the risk of false optimization, and future anti-cheating protocols will be crucial for major companies.

Multi-agent orchestration: Future system engineering will no longer rely on a single model but will involve multiple agents working in tandem. The system will include orchestrators responsible for planning, “expert agents” specializing in specific domains, and “sub-agents” handling narrow tasks to control context and avoid contamination of the reasoning process.

Conclusion: The next phase of competition in the AI industry

At the end of his article, Lin Junyang pointed out the next phase of competition in the AI industry: the core training focus in the future will no longer be merely the “model” itself, but rather the integrated system of “model + environment” (agents and their surrounding bundles).

Past reasoning era: Advantages came from better reinforcement learning (RL) algorithms, stronger feedback signals, and scalable training pipelines.

Future agent era: Advantages will depend on better environment design, tighter train-serve integration, stronger system engineering, and the ability for models to learn to take responsibility for their decisions and form a “closed loop.”

X original text

		Finance Hot Talk  
	





	China's car sales claim "world's number one"  High oil prices boost electric vehicle exports?
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin