Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
【Alibaba's Thousand Questions Former Core】Lin Junyang's Thousand-Word Essay: AI Models Will Shift Toward "Agent-Based Thinking" Revealing Why Qwen Abandoned Combined Thinking and Command Mode
Alibaba (09988)
Qwen’s key figure Lin Junyang suddenly resigned in early March, sparking speculation about disagreements with management. As the storm subsided, Lin Junyang recently published an article titled “From ‘Reasoning’ Thinking to ‘Agentic’ Thinking” on social platform X. Although the article primarily discusses the direction of AI technology, it hints at reflections on Alibaba Qwen’s technical roadmap.
He pointed out that the simple consumption of computing power in “reasoning thinking” has peaked, the second half of AI will belong to “agentic thinking” (Agentic Thinking) that can interact with the real environment, thinking while acting.
Shift in focus for AI: What will happen next?
Lin Junyang noted that the focus of the AI industry in the first half of 2025 will mainly concentrate on “reasoning thinking” — that is, how to make large models consume more time and computing power for thinking, how to utilize stronger feedback mechanisms to train models, and how to control these additional reasoning processes.
However, the problem that the industry currently faces is: what will happen next?
He believes the answer is undoubtedly “agentic thinking” (Agentic Thinking). Future AI should not just think behind closed doors to provide answers but should “think to take action.” It needs to perform reasoning while interacting with the environment and continuously update and revise plans based on feedback obtained from the real world.
Internal blueprint of Qwen and the failure of the “merger route”
Lin Junyang revealed for the first time the internal technical blueprint of the Qwen team at the beginning of 2025. At that time, many members hoped to create an ideal system that could unify “thinking” and “instruction” modes. The concept of this system is quite grand:
Intelligent adjustment: Can automatically determine how much reasoning power (similar to low/mid/high levels) is needed based on prompts and context.
Autonomous decision-making: Allow the model to decide when it should respond quickly and when it should think deeply, or allocate substantial computing power when encountering difficult problems.
Lin Junyang stated that Qwen3 is the clearest public attempt in this direction, introducing a “hybrid thinking mode” that emphasizes controllable thinking budgets. However, Lin Junyang admitted: “It’s easy to talk about merging, but it’s extremely difficult to execute.”
Lin Junyang believes that forced merging will lead to the model becoming “mediocre,” as the data distributions and behavioral goals behind the “thinking mode” and the “instruction mode” are entirely different; forcing a merge would cause “thinking behavior” to become verbose, bloated, and lack decisiveness; while “instruction behavior” loses its crispness and becomes unreliable, even significantly increasing the cost for commercial users.
In commercial reality, he believes that a large number of enterprise customers really need high throughput, low cost, and high controllability pure instruction operations (such as batch processing).
For this reason, the Qwen team ultimately chose to release independent instruction (Instruct) and thinking (Thinking) versions in the subsequent 2507 series. Lin Junyang believes that separating the two allows the team to focus more purely on solving their respective data and training issues, avoiding the emergence of “two awkwardly fused personalities.”
Competitor strategies: Anthropic’s “moderation” and goal-oriented approach
Unlike Qwen’s separation route, other labs such as Anthropic and GLM-4.5 chose the entirely opposite “integration route.”
Lin Junyang specifically mentioned Anthropic’s (Claude series) approach, believing that its development trajectory demonstrates a sense of rigor and moderation, with Claude 3.7 / Claude 4 alternating between reasoning and “tool usage.”
Goal-oriented thinking: Anthropic believes that producing extremely long reasoning paths does not equate to a smarter model. If a model goes on at length about all trivial matters, it actually signifies improper resource allocation.
Practicality first: If the goal is to write programs, AI’s thinking should be used for planning, breaking down tasks, fixing bugs, and invoking tools; if it’s for agent workflows, thinking should be used to enhance the quality of long-term task execution rather than simply producing seemingly impressive “reasoning essays.”
Core differences between reasoning thinking and agentic thinking
Lin Junyang predicts that “agentic thinking” will ultimately replace the kind of static monologue reasoning that lacks interaction. A truly advanced system should have the right to search, simulate, execute, check, and revise to solve problems in a robust and efficient manner.
Changing evaluation criteria: Shifts from “Can the model solve math problems?” to “Can the model make progress while interacting with the environment?”
Real-world challenges that need to be addressed:
Three major technical challenges in achieving “agentic thinking”
In addition to application-level differences, Lin Junyang further analyzed the significant challenges in bottom-level development for agentic thinking:
Bottlenecks in training infrastructure (GPU efficiency collapse): Agent-based reinforcement learning (RL) is much more challenging than simple reasoning RL. AI agents need to interact frequently with external tools (such as browsers, execution sandboxes), and waiting for feedback from the real environment can lead to training stagnation, significantly lowering GPU utilization. In the future, “training” and “reasoning” must be cleanly decoupled.
“Reward hacking” and cheating risks: Once a model has the authority to use tools, it can easily learn to “cheat” to deceive the system into gaining rewards (for example, using system vulnerabilities to peek at future information), rather than genuinely solving problems. Tools exacerbate the risk of false optimization, and future anti-cheating protocols will be crucial for major companies.
Multi-agent orchestration: Future system engineering will no longer rely on a single model but will involve multiple agents working in tandem. The system will include orchestrators responsible for planning, “expert agents” specializing in specific domains, and “sub-agents” handling narrow tasks to control context and avoid contamination of the reasoning process.
Conclusion: The next phase of competition in the AI industry
At the end of his article, Lin Junyang pointed out the next phase of competition in the AI industry: the core training focus in the future will no longer be merely the “model” itself, but rather the integrated system of “model + environment” (agents and their surrounding bundles).
Past reasoning era: Advantages came from better reinforcement learning (RL) algorithms, stronger feedback signals, and scalable training pipelines.
Future agent era: Advantages will depend on better environment design, tighter train-serve integration, stronger system engineering, and the ability for models to learn to take responsibility for their decisions and form a “closed loop.”
X original text