The trend of AI + Web3 integration: Computing Power data decentralization becomes the focus

GasWhisperer · 2025-07-13T11:04:20+00:00

# AI+Web3: Towers and Squares## TL;DR1. Web3 projects with AI concepts have become targets for capital attraction in both primary and secondary markets.2. The opportunities of Web3 in the AI industry are reflected in: using distributed incentives to coordinate potential supply in the long tail, across data, storage, and computing; at the same time, establishing an open-source model and a decentralized market for AI Agents.3. The main application of AI in the Web3 industry is on-chain finance ( cryptocurrency payments, trading, data analysis ), and assisting in development.4. The utility of AI+Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break boundaries.![AI+Web3: Towers and Squares](https://img-cdn.gateio.im/social/moments-25bce79fdc74e866d6663cf31b15ee55)## IntroductionIn the past two years, the development of AI has been accelerated as if a fast-forward button has been pressed. This butterfly effect triggered by ChatGPT has not only opened a new world of generative artificial intelligence but has also stirred up a wave in Web3 on the other side.With the support of AI concepts, the financing boost in the slowing cryptocurrency market is significant. In the first half of 2024 alone, 64 Web3+AI projects completed financing, with the AI-based operating system Zyber365 achieving a maximum financing amount of 100 million USD in its Series A round.The secondary market is more prosperous, and data from cryptocurrency aggregation websites shows that in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume close to $8.6 billion; the positive effects brought by advancements in mainstream AI technology are evident, as the average price of the AI sector rose by 151% following the release of OpenAI's Sora text-to-video model; the AI effect has also radiated to one of the cryptocurrency fundraising sectors, Meme: the first AI Agent concept MemeCoin------GOAT has quickly gained popularity and achieved a valuation of $1.4 billion, successfully sparking the AI Meme craze.The research and topics surrounding AI+Web3 are equally hot, from AI+Depin to AI Memecoin, and now to the current AI Agent and AI DAO, the FOMO sentiment can no longer keep up with the speed of the new narrative rotation.AI+Web3, this combination of terms filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It seems difficult for us to distinguish beneath this gorgeous robe, whether it is the playground of speculators or the eve of an outbreak at dawn?To answer this question, a key consideration for both parties is whether it will get better with the other involved. Can one benefit from the other's model? In this article, we also attempt to examine this pattern from the shoulders of our predecessors: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?## Part.1 What opportunities does Web3 have under the AI stack?Before delving into this topic, we need to understand the technology stack of AI large models:Expressing the entire process in simpler language: "Large models" are like the human brain. In the early stages, this brain belongs to a newborn baby who has just arrived in the world, needing to observe and absorb vast amounts of external information to understand this world. This is the "collection" phase of data. Since computers do not possess human senses such as vision and hearing, before training, the large-scale unlabelled information from the outside world needs to be transformed into a format that computers can understand and use through "preprocessing."After inputting the data, the AI constructs a model with understanding and prediction capabilities through "training", which can be viewed as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities that the baby adjusts continuously during the learning process. When the content of learning begins to branch out or when communication with others provides feedback and corrections, it enters the "fine-tuning" stage of the large model.As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of large AI models, where the model can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language skills. This is also akin to how large AI models are applied to various specific tasks, such as image classification and speech recognition, after completing training and being put into use in the reasoning phase.The AI Agent is moving closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only thinking abilities but also memory, planning, and the ability to use tools to interact with the world.Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered, interconnected ecosystem that covers all stages of the AI model process.![AI+Web3: Towers and Squares](https://img-cdn.gateio.im/social/moments-cc3bf45e321f9b1d1280bf3bb827d9f4)### 1. Basic Layer: Computing Power and Data's Airbnb#### Hash RateCurrently, one of the highest costs of AI is the computing power and energy required to train and infer models.An example is that Meta's LLAMA3 requires 16,000 H100 GPUs produced by NVIDIA(, which is a top-tier graphics processing unit designed specifically for artificial intelligence and high-performance computing workloads. It takes 30 days to complete the training. The unit price of the latter 80GB version ranges from $30,000 to $40,000, necessitating a computing hardware investment of $400 million to $700 million), along with GPU + network chips(. Additionally, the monthly training consumes 1.6 billion kilowatt-hours, with energy expenditures nearing $20 million per month.The decompression of AI computing power is also the earliest area where Web3 intersects with AI------DePin) decentralized physical infrastructure network(. Currently, a data website has displayed over 1,400 projects, among which representative projects for GPU computing power sharing include io.net, Aethir, Akash, Render Network, and so on.The main logic is: the platform allows individuals or entities with idle GPU resources to contribute their computing power in a permissionless decentralized manner, improving the utilization of underutilized GPU resources through an online marketplace for buyers and sellers similar to Uber or Airbnb, thus providing end users with more cost-effective and efficient computing resources; at the same time, the staking mechanism ensures that if there are violations of quality control mechanisms or network interruptions, resource providers face corresponding penalties.Its characteristics are:- Gather idle GPU resources: The suppliers mainly include surplus computing resources from third-party independent small and medium-sized data centers, cryptocurrency mining farms, and other operators, with mining hardware based on PoS consensus mechanisms, such as FileCoin and ETH miners. Currently, there are also projects dedicated to launching devices with lower entry barriers, such as exolab, which utilizes local devices like MacBook, iPhone, and iPad to establish a computing network for running large model inferences.- Facing the long-tail market of AI computing power:a. "From a technical perspective, a decentralized computing power market is more suitable for inference processes. Training relies more on the data processing capabilities brought by large-scale GPU clusters, while inference requires relatively lower GPU computing performance, such as Aethir, which focuses on low-latency rendering tasks and AI inference applications."b. "From the demand side perspective," small to medium computing power demanders will not train their own large models separately, but will instead choose to optimize and fine-tune around a few leading large models, and these scenarios are inherently suitable for distributed idle computing power resources.- Decentralized Ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand, while also generating profits.)# DataData is the foundation of AI. Without data, computation is as useless as floating weeds, and the relationship between data and models is like the saying "Garbage in, Garbage out"; the quantity and quality of data determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, comprehension ability, and even values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:- Data hunger: AI model training relies on large amounts of data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching the trillion level.- Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, professionalism of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new requirements for its quality.- Privacy and compliance issues: Currently, countries and enterprises are gradually recognizing the importance of high-quality datasets and are imposing restrictions on data scraping.- High data processing costs: Large data volumes and complex processing. Public data shows that over 30% of AI companies' R&D costs are used for basic data collection and processing.Currently, web3 solutions are reflected in the following four aspects:1. Data Collection: The availability of free real-world data for scraping is rapidly diminishing, and the expenditures of AI companies for data are increasing year by year. However, this expenditure has not been reflected back to the true contributors of the data, as platforms fully enjoy the value creation brought by the data. For instance, a certain platform has achieved a total revenue of $203 million through data licensing agreements with AI companies.Allowing users who truly contribute to also participate in the value creation brought by data, as well as acquiring more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms, is the vision of Web3.- Grass is a decentralized data layer and network, where users can run Grass nodes to contribute idle bandwidth and relay traffic to capture real-time data from across the internet and earn token rewards.- Vana introduces a unique data liquidity pool concept ###DLP(, where users can upload their private data ) such as shopping records, browsing habits, social media activities, etc. ( to a specific DLP, and flexibly choose whether to authorize these data for use by specific third parties;- In PublicAI, users can use )Web3 as a category tag on X and @PublicAI to achieve data collection.2. Data Preprocessing: In the process of AI data processing, the collected data is usually noisy and contains errors, so it must be cleaned and transformed into a usable format before training the model, involving the repetitive tasks of standardization, filtering, and handling missing values. This stage is one of the few manual processes in the AI industry, giving rise to the profession of data annotators. As the model's requirements for data quality increase, the threshold for data annotators has also risen, and this task is naturally suited for the decentralized incentive mechanism of Web3.- Currently, Grass and OpenLayer are both considering incorporating data labeling as a key segment.- Synesis proposed the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing labeled data, annotations, or other forms of input.- The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.3. Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technology and its potential application scenarios are reflected in two aspects: #AI或# training of sensitive data; ( data collaboration: multiple data owners can jointly participate in AI training without having to share their original data.The current common privacy technologies in Web3 include:- Trusted Execution Environment ) TEE (, such as Super Protocol;- Fully Homomorphic Encryption ) FHE (, for example BasedAI, Fhenix.io or Inco Network;- Zero-knowledge technology ) zk (, such as the Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs for HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.However, the field is still in its early stages, and most projects are still in exploration. One current dilemma is that the computing costs are too high, some examples are:- The zkML framework EZKL takes about 80 minutes to generate a proof for a 1M-nanoGPT model.- According to data from Modulus Labs, the overhead of zkML is over 1000 times higher than pure computation.4. Data Storage: After obtaining data, a place is needed to store data on-chain, as well as the LLM generated using that data. With data availability )DA( as the core issue, the throughput before the Ethereum Danksharding upgrade was 0.08MB. Meanwhile, training AI models and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference makes existing on-chain solutions face challenges.

GasWhisperer

2025-07-13 11:04:20

AI+Web3: Towers and Squares

TL;DR

Web3 projects with AI concepts have become targets for capital attraction in both primary and secondary markets.
The opportunities of Web3 in the AI industry are reflected in: using distributed incentives to coordinate potential supply in the long tail, across data, storage, and computing; at the same time, establishing an open-source model and a decentralized market for AI Agents.
The main application of AI in the Web3 industry is on-chain finance ( cryptocurrency payments, trading, data analysis ), and assisting in development.
The utility of AI+Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break boundaries.

Introduction

In the past two years, the development of AI has been accelerated as if a fast-forward button has been pressed. This butterfly effect triggered by ChatGPT has not only opened a new world of generative artificial intelligence but has also stirred up a wave in Web3 on the other side.

With the support of AI concepts, the financing boost in the slowing cryptocurrency market is significant. In the first half of 2024 alone, 64 Web3+AI projects completed financing, with the AI-based operating system Zyber365 achieving a maximum financing amount of 100 million USD in its Series A round.

The secondary market is more prosperous, and data from cryptocurrency aggregation websites shows that in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume close to $8.6 billion; the positive effects brought by advancements in mainstream AI technology are evident, as the average price of the AI sector rose by 151% following the release of OpenAI's Sora text-to-video model; the AI effect has also radiated to one of the cryptocurrency fundraising sectors, Meme: the first AI Agent concept MemeCoin------GOAT has quickly gained popularity and achieved a valuation of $1.4 billion, successfully sparking the AI Meme craze.

The research and topics surrounding AI+Web3 are equally hot, from AI+Depin to AI Memecoin, and now to the current AI Agent and AI DAO, the FOMO sentiment can no longer keep up with the speed of the new narrative rotation.

AI+Web3, this combination of terms filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It seems difficult for us to distinguish beneath this gorgeous robe, whether it is the playground of speculators or the eve of an outbreak at dawn?

To answer this question, a key consideration for both parties is whether it will get better with the other involved. Can one benefit from the other's model? In this article, we also attempt to examine this pattern from the shoulders of our predecessors: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?

Part.1 What opportunities does Web3 have under the AI stack?

Before delving into this topic, we need to understand the technology stack of AI large models:

Expressing the entire process in simpler language: "Large models" are like the human brain. In the early stages, this brain belongs to a newborn baby who has just arrived in the world, needing to observe and absorb vast amounts of external information to understand this world. This is the "collection" phase of data. Since computers do not possess human senses such as vision and hearing, before training, the large-scale unlabelled information from the outside world needs to be transformed into a format that computers can understand and use through "preprocessing."

After inputting the data, the AI constructs a model with understanding and prediction capabilities through "training", which can be viewed as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities that the baby adjusts continuously during the learning process. When the content of learning begins to branch out or when communication with others provides feedback and corrections, it enters the "fine-tuning" stage of the large model.

As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of large AI models, where the model can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language skills. This is also akin to how large AI models are applied to various specific tasks, such as image classification and speech recognition, after completing training and being put into use in the reasoning phase.

The AI Agent is moving closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only thinking abilities but also memory, planning, and the ability to use tools to interact with the world.

Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered, interconnected ecosystem that covers all stages of the AI model process.

1. Basic Layer: Computing Power and Data's Airbnb

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required to train and infer models.

An example is that Meta's LLAMA3 requires 16,000 H100 GPUs produced by NVIDIA(, which is a top-tier graphics processing unit designed specifically for artificial intelligence and high-performance computing workloads. It takes 30 days to complete the training. The unit price of the latter 80GB version ranges from $30,000 to $40,000, necessitating a computing hardware investment of $400 million to $700 million), along with GPU + network chips(. Additionally, the monthly training consumes 1.6 billion kilowatt-hours, with energy expenditures nearing $20 million per month.

The decompression of AI computing power is also the earliest area where Web3 intersects with AI------DePin) decentralized physical infrastructure network(. Currently, a data website has displayed over 1,400 projects, among which representative projects for GPU computing power sharing include io.net, Aethir, Akash, Render Network, and so on.

The main logic is: the platform allows individuals or entities with idle GPU resources to contribute their computing power in a permissionless decentralized manner, improving the utilization of underutilized GPU resources through an online marketplace for buyers and sellers similar to Uber or Airbnb, thus providing end users with more cost-effective and efficient computing resources; at the same time, the staking mechanism ensures that if there are violations of quality control mechanisms or network interruptions, resource providers face corresponding penalties.

Its characteristics are:

Gather idle GPU resources: The suppliers mainly include surplus computing resources from third-party independent small and medium-sized data centers, cryptocurrency mining farms, and other operators, with mining hardware based on PoS consensus mechanisms, such as FileCoin and ETH miners. Currently, there are also projects dedicated to launching devices with lower entry barriers, such as exolab, which utilizes local devices like MacBook, iPhone, and iPad to establish a computing network for running large model inferences.
Facing the long-tail market of AI computing power:

a. "From a technical perspective, a decentralized computing power market is more suitable for inference processes. Training relies more on the data processing capabilities brought by large-scale GPU clusters, while inference requires relatively lower GPU computing performance, such as Aethir, which focuses on low-latency rendering tasks and AI inference applications."

b. "From the demand side perspective," small to medium computing power demanders will not train their own large models separately, but will instead choose to optimize and fine-tune around a few leading large models, and these scenarios are inherently suitable for distributed idle computing power resources.

Decentralized Ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand, while also generating profits.

)# Data

Data is the foundation of AI. Without data, computation is as useless as floating weeds, and the relationship between data and models is like the saying "Garbage in, Garbage out"; the quantity and quality of data determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, comprehension ability, and even values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

Data hunger: AI model training relies on large amounts of data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching the trillion level.
Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, professionalism of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new requirements for its quality.
Privacy and compliance issues: Currently, countries and enterprises are gradually recognizing the importance of high-quality datasets and are imposing restrictions on data scraping.
High data processing costs: Large data volumes and complex processing. Public data shows that over 30% of AI companies' R&D costs are used for basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The availability of free real-world data for scraping is rapidly diminishing, and the expenditures of AI companies for data are increasing year by year. However, this expenditure has not been reflected back to the true contributors of the data, as platforms fully enjoy the value creation brought by the data. For instance, a certain platform has achieved a total revenue of $203 million through data licensing agreements with AI companies.

Allowing users who truly contribute to also participate in the value creation brought by data, as well as acquiring more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms, is the vision of Web3.

Grass is a decentralized data layer and network, where users can run Grass nodes to contribute idle bandwidth and relay traffic to capture real-time data from across the internet and earn token rewards.
Vana introduces a unique data liquidity pool concept ###DLP(, where users can upload their private data ) such as shopping records, browsing habits, social media activities, etc. ( to a specific DLP, and flexibly choose whether to authorize these data for use by specific third parties;
In PublicAI, users can use )Web3 as a category tag on X and @PublicAI to achieve data collection.

Data Preprocessing: In the process of AI data processing, the collected data is usually noisy and contains errors, so it must be cleaned and transformed into a usable format before training the model, involving the repetitive tasks of standardization, filtering, and handling missing values. This stage is one of the few manual processes in the AI industry, giving rise to the profession of data annotators. As the model's requirements for data quality increase, the threshold for data annotators has also risen, and this task is naturally suited for the decentralized incentive mechanism of Web3.

Currently, Grass and OpenLayer are both considering incorporating data labeling as a key segment.
Synesis proposed the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing labeled data, annotations, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technology and its potential application scenarios are reflected in two aspects: #AI或# training of sensitive data; ( data collaboration: multiple data owners can jointly participate in AI training without having to share their original data.

The current common privacy technologies in Web3 include:

Trusted Execution Environment ) TEE (, such as Super Protocol;
Fully Homomorphic Encryption ) FHE (, for example BasedAI, Fhenix.io or Inco Network;
Zero-knowledge technology ) zk (, such as the Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs for HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still in exploration. One current dilemma is that the computing costs are too high, some examples are:

The zkML framework EZKL takes about 80 minutes to generate a proof for a 1M-nanoGPT model.
According to data from Modulus Labs, the overhead of zkML is over 1000 times higher than pure computation.

Data Storage: After obtaining data, a place is needed to store data on-chain, as well as the LLM generated using that data. With data availability )DA( as the core issue, the throughput before the Ethereum Danksharding upgrade was 0.08MB. Meanwhile, training AI models and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference makes existing on-chain solutions face challenges.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

9 Likes