Token Usage Increases 10x in a Year, No Wonder Manufacturers Are Raising Prices

robot
Abstract generation in progress

Driven by the global surge in AI demand, tight computing resources, and rising hardware costs, large model providers and cloud service providers have recently collectively increased prices.

On March 11, Tencent Cloud took the lead in adjusting its billing strategy, raising the input price for the Mixyuan series model Tencent HY2.0 Instruct from 0.0008 yuan per thousand tokens to 0.004505 yuan per thousand tokens, an increase of 463%. At the same time, it ended free public testing for third-party models like GLM 5 and MiniMax, transitioning to official commercial use.

On March 16, Zhipu launched the new model GLM-5-Turbo and increased API prices by 20%, with an average increase of 83% compared to the previous GLM-4.7.

On March 18, Alibaba Cloud and Baidu Smart Cloud announced price hikes on the same day. According to official announcements, Alibaba Cloud will adjust prices for services related to the Pengtougou Zhenwu 810E computing card and CPFS (Intelligent Computing Edition), with increases up to 34%. Baidu Smart Cloud explicitly stated that AI computing power-related product and service prices will rise by about 5%–30%, and prices for parallel file storage will increase by approximately 30%. Both cloud providers attributed the price increases to the “explosive global AI demand.”

Although the overall global AI demand is difficult to estimate, insights can be gleaned from data disclosed by OpenRouter, the world’s largest API aggregation platform. OpenRouter is an API platform that aggregates multiple AI models, allowing ordinary users and developers to access different large language models (LLMs) via a unified interface, such as GPT-4, Claude, Gemini, Deepseek, and others.

Data from OpenRouter shows that, in the week of March 24, 2025, the total tokens used to call large models on the platform reached 1.62 trillion. By the same week this year, March 9, the usage had skyrocketed to 16.90 trillion, a tenfold increase in one year. Notably, since the official release of OpenClaw (“Lobster”) on January 30 this year, the growth rate of token usage has accelerated further. In the week of its release, token usage on the platform was 8.25 trillion, and in just over a month, this number doubled to 16.90 trillion.

Galaxy Securities’ research report states that in the first week of March 2026, the platform’s processing volume reached 14.8 trillion tokens, doubling compared to the beginning of the year, with agent-driven workflows accounting for more than half of the total output.

Additionally, based on token usage in the week of March 9, four domestic large models dominated the rankings. MiniMax M2.5 led with 1.75 trillion, followed by Step 3.5 Flash and DeepSeek V3.2 with 1.34 trillion and 1.04 trillion respectively, and Kimi K2.5 ranked ninth with 0.56 trillion. Since the week of February 9, when Chinese models’ call volume first surpassed that of the US, MiniMax M2.5 has maintained the top position for five consecutive weeks.

Although OpenRouter only counts calls made through its platform and does not include users directly using original APIs, as the world’s largest API aggregation platform, these data still reflect the strength and enormous demand for domestic large models.

(Article source: Oriental Fortune Research Center)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments