AI + Web3 Collaboration: Unlocking a New Landscape of Data and Computing Power

TxFailed · 2025-07-21T00:20:04+00:00

# AI+Web3: Towers and Squares## Key Points1. Web3 projects with AI concepts have become targets for capital attraction in the primary and secondary markets.2. The opportunities for Web3 in the AI industry are reflected in: utilizing distributed incentives to coordinate potential supply in the long tail, involving data, storage, and computation; at the same time, establishing open-source models and a decentralized market for AI Agents.3. AI is primarily applied in the Web3 industry for on-chain finance (crypto payments, trading, data analysis) and assisting development.4. The utility of AI + Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break out of its niche.![AI+Web3: Tower and Square](https://img-cdn.gateio.im/social/moments-25bce79fdc74e866d6663cf31b15ee55)## IntroductionIn the past two years, the development of AI has shown an accelerated trend. The wave of generative artificial intelligence triggered by Chatgpt has also created a huge ripple in the Web3 field.With the support of AI concepts, financing in the cryptocurrency market has significantly boosted. According to statistics, 64 Web3+AI projects completed financing in the first half of 2024, among which the AI-based operating system Zyber365 achieved the highest financing amount of 100 million USD in Series A.The secondary market is thriving even more. According to the crypto aggregation site Coingecko, in just over a year, the total market value of the AI sector has reached 48.5 billion USD, with a 24-hour trading volume close to 8.6 billion USD. The significant progress in mainstream AI technology has brought clear benefits; after the release of OpenAI's Sora text-to-video model, the average price in the AI sector increased by 151%. The AI effect has also radiated to one of the cryptocurrency fundraising segments, Meme: the first AI Agent concept MemeCoin GOAT quickly became popular and gained a valuation of 1.4 billion USD, successfully sparking the AI Meme craze.The research and topics around AI+Web3 are equally hot, from AI+Depin to AI Memecoin and now to AI Agent and AI DAO, the speed of the new narrative rotation makes it hard for FOMO emotions to keep up.The term combination of AI+Web3, filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It is difficult for us to discern whether, beneath this glamorous exterior, it is a playground for speculators or the eve of a dawn explosion?To answer this question, the key lies in considering: will it get better with the other party? Can we benefit from the other party's model? This article attempts to examine this pattern from the perspective of previous thinkers: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?## Opportunities of Web3 under AI StackBefore delving into this topic, we need to understand the technology stack of AI large models:Large models are like the human brain; in the early stages, they are akin to a newborn baby, needing to observe and absorb vast amounts of external information to understand the world. This is the "data collection" phase. Since computers do not possess human-like multi-sensory capabilities, it is necessary to convert unlabelled information into a format that computers can understand through "preprocessing" before training.After inputting data, AI builds a model with understanding and predictive capabilities through "training", similar to how a baby gradually learns to understand the outside world. The model parameters are akin to the language abilities that the baby continuously adjusts. The learning content is categorized or feedback is obtained through communication with others, leading to the "fine-tuning" phase.After children grow up and learn to speak, they are able to understand meanings and express feelings and thoughts in new conversations, similar to the "reasoning" of AI large models, which can perform predictive analysis on new language and text inputs. Infants express feelings, describe objects, and solve problems through language abilities, similar to how AI large models are applied to various specific tasks such as image classification and speech recognition during the reasoning phase after training.AI Agent is closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only the ability to think but also to remember, plan, and interact with the world using tools.In response to the pain points of AI stacks, Web3 has currently formed a multi-layered, interconnected ecosystem that encompasses all stages of the AI model process.![AI+Web3: Towers and Squares](https://img-cdn.gateio.im/social/moments-cc3bf45e321f9b1d1280bf3bb827d9f4)### Base Layer: The Airbnb of Computing Power and Data#### Hash RateCurrently, one of the highest costs of AI is the computing power and energy required to train models and perform inference.For example, Meta's LLAMA3 requires 16,000 NVIDIA H100 GPUs for 30 days to complete training. The unit price of the H100 80GB version is between $30,000 and $40,000, which necessitates an investment of $400 million to $700 million in computing hardware (GPUs + network chips). The monthly training consumes 1.6 billion kilowatt-hours, with energy expenses nearing $20 million.The release of AI computing power is also one of the earliest intersections of Web3 and AI - DePin (Decentralized Physical Infrastructure Network). The DePin Ninja data website has listed over 1,400 projects, with representative projects in GPU computing power sharing including io.net, Aethir, Akash, Render Network, and more.The main logic is: the platform allows owners of idle GPU resources to contribute computing power in a permissionless decentralized manner, increasing the utilization of underutilized GPU resources through an online marketplace similar to Uber or Airbnb, where end users can obtain efficient computing resources at a lower cost; at the same time, the staking mechanism ensures that resource providers face corresponding penalties when they violate quality control or interrupt the network.Features include:- Aggregating idle GPU resources: The suppliers mainly consist of third-party independent small and medium-sized data centers, excess computing power resources from operators such as cryptocurrency mining farms, and mining hardware for PoS consensus mechanisms, such as FileCoin and ETH miners. Some projects are dedicated to launching lower-threshold devices, such as exolab, which utilizes local devices like MacBook, iPhone, and iPad to establish a computing power network for running large model inference.- Long-tail market oriented towards AI computing power: a. Technical aspect: The decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities of extremely large cluster-scale GPUs, while inference has relatively lower requirements for GPU computing performance, such as Aethir focusing on low-latency rendering work and AI inference applications.b. Demand side: Small and medium computing power demanders will not train their own large models separately, but will only choose to optimize and fine-tune around a few leading large models. These scenarios are naturally suitable for distributed idle computing resources.- Decentralized ownership: The significance of blockchain technology lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand, while also generating profits.#### DataData is the foundation of AI. Without data, computation is as useless as floating duckweed. The relationship between data and models is akin to the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For current AI model training, data determines the model's language ability, understanding ability, and even its values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:- Data Hunger: AI model training relies on massive data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching trillions.- Data Quality: With the integration of AI and various industries, new demands for data quality arise from the timeliness, diversity, specialization of vertical data, and emerging data sources such as social media sentiment analysis.- Privacy and compliance issues: Countries and companies are gradually realizing the importance of high-quality datasets and are imposing restrictions on data scraping.- High costs of data processing: Large data volume and complex processing. Public information shows that AI companies spend over 30% of their R&D costs on basic data collection and processing.Currently, web3 solutions are reflected in the following four aspects:1. Data Collection: The freely available real-world data is rapidly depleting, and AI companies are increasing their spending on data year by year. However, this expenditure has not been returned to the true contributors of the data, as platforms have entirely enjoyed the value creation brought by the data, such as Reddit generating a total revenue of $203 million through data licensing agreements with AI companies.The vision of Web3 is to allow users who truly contribute to also participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms.- Grass is a decentralized data layer and network that allows users to run Grass nodes to contribute idle bandwidth and relay traffic in order to capture real-time data from across the internet and earn token rewards.- Vana introduces a unique Data Liquidity Pool (DLP) concept, allowing users to upload private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use it.- In PublicAI, users can use #AI或#Web3 as a classification tag on X and @PublicAI to achieve data collection.2. Data Preprocessing: During the AI data processing, the collected data is often noisy and contains errors, requiring cleaning and conversion into a usable format before training the model. This involves standardization, filtering, and handling missing values, which are repetitive tasks. This stage is one of the few manual processes in the AI industry, leading to the emergence of the data labeling profession. As the model's requirements for data quality increase, the threshold for data labelers has also risen, making this task inherently suitable for the decentralized incentive mechanism of Web3.- Grass and OpenLayer are both considering incorporating data annotation as a key component.- Synesis proposed the "Train2earn" concept, emphasizing data quality, where users can earn rewards by providing annotated data, comments, or other forms of input.- The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.3. Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technologies and their potential application scenarios are reflected in two aspects: (1) sensitive data training; (2) data collaboration: multiple data owners can participate in AI training together without sharing the original data.The current common privacy technologies in Web3 include:- Trusted Execution Environment ( TEE ), such as Super Protocol.- Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io, or Inco Network.- Zero-knowledge technology (zk), such as the Reclaim Protocol that uses zkTLS technology, generates zero-knowledge proofs for HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.However, the field is still in its early stages, and most projects are still in exploration. The current dilemma is that the computing costs are too high, for example:- The zkML framework EZKL takes about 80 minutes to generate a proof for the 1M-nanoGPT model.- According to Modulus Labs data, the overhead of zkML is more than 1000 times higher than pure computation.4. Data Storage: Once the data is available, a place is needed to store the data on-chain, as well as the LLM generated using that data. With data availability (DA) as the core issue, before the Ethereum Danksharding upgrade, its throughput was 0.08MB. Meanwhile, AI model training and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference makes existing on-chain solutions struggle when facing "resource-intensive AI applications."- 0g.AI is a representative project in this category. It is a centralized storage solution designed for high-performance AI needs, with key features including: high performance and scalability, supporting fast upload and download of large-scale datasets through advanced sharding and erasure coding technologies, with data transfer speeds nearing 5GB per second.### Middleware: Model Training and Inference#### Open Source Model Decentralized MarketThe debate over whether AI models should be closed source or open source has never disappeared. The collective innovation brought about by open source is an unmatched advantage that closed source models cannot compete with. However, without a profit model, how can open source models enhance developers' motivation? This is a direction worth considering. Baidu founder Robin Li asserted in April of this year that "open source models will increasingly fall behind."In response, Web3 proposes the possibility of a decentralized open-source model market, which involves tokenizing the model itself, reserving a certain proportion of tokens for the team, and directing a portion of the model's future income flow to token holders.- The Bittensor protocol establishes a P2P market for open-source models, consisting of dozens of "subnets" where resource providers (computing, data collection/storage, machine learning talent) compete with each other to meet the goals of specific subnet owners. The subnets can interact and learn from each other to achieve more powerful intelligence. Rewards are distributed by community voting and further allocated based on competitive performance.

TxFailed

2025-07-21 00:20:04

AI+Web3: Towers and Squares

Key Points

Web3 projects with AI concepts have become targets for capital attraction in the primary and secondary markets.
The opportunities for Web3 in the AI industry are reflected in: utilizing distributed incentives to coordinate potential supply in the long tail, involving data, storage, and computation; at the same time, establishing open-source models and a decentralized market for AI Agents.
AI is primarily applied in the Web3 industry for on-chain finance (crypto payments, trading, data analysis) and assisting development.
The utility of AI + Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break out of its niche.

Introduction

In the past two years, the development of AI has shown an accelerated trend. The wave of generative artificial intelligence triggered by Chatgpt has also created a huge ripple in the Web3 field.

With the support of AI concepts, financing in the cryptocurrency market has significantly boosted. According to statistics, 64 Web3+AI projects completed financing in the first half of 2024, among which the AI-based operating system Zyber365 achieved the highest financing amount of 100 million USD in Series A.

The secondary market is thriving even more. According to the crypto aggregation site Coingecko, in just over a year, the total market value of the AI sector has reached 48.5 billion USD, with a 24-hour trading volume close to 8.6 billion USD. The significant progress in mainstream AI technology has brought clear benefits; after the release of OpenAI's Sora text-to-video model, the average price in the AI sector increased by 151%. The AI effect has also radiated to one of the cryptocurrency fundraising segments, Meme: the first AI Agent concept MemeCoin GOAT quickly became popular and gained a valuation of 1.4 billion USD, successfully sparking the AI Meme craze.

The research and topics around AI+Web3 are equally hot, from AI+Depin to AI Memecoin and now to AI Agent and AI DAO, the speed of the new narrative rotation makes it hard for FOMO emotions to keep up.

The term combination of AI+Web3, filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It is difficult for us to discern whether, beneath this glamorous exterior, it is a playground for speculators or the eve of a dawn explosion?

To answer this question, the key lies in considering: will it get better with the other party? Can we benefit from the other party's model? This article attempts to examine this pattern from the perspective of previous thinkers: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?

Opportunities of Web3 under AI Stack

Before delving into this topic, we need to understand the technology stack of AI large models:

Large models are like the human brain; in the early stages, they are akin to a newborn baby, needing to observe and absorb vast amounts of external information to understand the world. This is the "data collection" phase. Since computers do not possess human-like multi-sensory capabilities, it is necessary to convert unlabelled information into a format that computers can understand through "preprocessing" before training.

After inputting data, AI builds a model with understanding and predictive capabilities through "training", similar to how a baby gradually learns to understand the outside world. The model parameters are akin to the language abilities that the baby continuously adjusts. The learning content is categorized or feedback is obtained through communication with others, leading to the "fine-tuning" phase.

After children grow up and learn to speak, they are able to understand meanings and express feelings and thoughts in new conversations, similar to the "reasoning" of AI large models, which can perform predictive analysis on new language and text inputs. Infants express feelings, describe objects, and solve problems through language abilities, similar to how AI large models are applied to various specific tasks such as image classification and speech recognition during the reasoning phase after training.

AI Agent is closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only the ability to think but also to remember, plan, and interact with the world using tools.

In response to the pain points of AI stacks, Web3 has currently formed a multi-layered, interconnected ecosystem that encompasses all stages of the AI model process.

Base Layer: The Airbnb of Computing Power and Data

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required to train models and perform inference.

For example, Meta's LLAMA3 requires 16,000 NVIDIA H100 GPUs for 30 days to complete training. The unit price of the H100 80GB version is between $30,000 and $40,000, which necessitates an investment of $400 million to $700 million in computing hardware (GPUs + network chips). The monthly training consumes 1.6 billion kilowatt-hours, with energy expenses nearing $20 million.

The release of AI computing power is also one of the earliest intersections of Web3 and AI - DePin (Decentralized Physical Infrastructure Network). The DePin Ninja data website has listed over 1,400 projects, with representative projects in GPU computing power sharing including io.net, Aethir, Akash, Render Network, and more.

The main logic is: the platform allows owners of idle GPU resources to contribute computing power in a permissionless decentralized manner, increasing the utilization of underutilized GPU resources through an online marketplace similar to Uber or Airbnb, where end users can obtain efficient computing resources at a lower cost; at the same time, the staking mechanism ensures that resource providers face corresponding penalties when they violate quality control or interrupt the network.

Features include:

Aggregating idle GPU resources: The suppliers mainly consist of third-party independent small and medium-sized data centers, excess computing power resources from operators such as cryptocurrency mining farms, and mining hardware for PoS consensus mechanisms, such as FileCoin and ETH miners. Some projects are dedicated to launching lower-threshold devices, such as exolab, which utilizes local devices like MacBook, iPhone, and iPad to establish a computing power network for running large model inference.
Long-tail market oriented towards AI computing power: a. Technical aspect: The decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities of extremely large cluster-scale GPUs, while inference has relatively lower requirements for GPU computing performance, such as Aethir focusing on low-latency rendering work and AI inference applications. b. Demand side: Small and medium computing power demanders will not train their own large models separately, but will only choose to optimize and fine-tune around a few leading large models. These scenarios are naturally suitable for distributed idle computing resources.
Decentralized ownership: The significance of blockchain technology lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand, while also generating profits.

Data

Data is the foundation of AI. Without data, computation is as useless as floating duckweed. The relationship between data and models is akin to the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For current AI model training, data determines the model's language ability, understanding ability, and even its values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

Data Hunger: AI model training relies on massive data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching trillions.
Data Quality: With the integration of AI and various industries, new demands for data quality arise from the timeliness, diversity, specialization of vertical data, and emerging data sources such as social media sentiment analysis.
Privacy and compliance issues: Countries and companies are gradually realizing the importance of high-quality datasets and are imposing restrictions on data scraping.
High costs of data processing: Large data volume and complex processing. Public information shows that AI companies spend over 30% of their R&D costs on basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The freely available real-world data is rapidly depleting, and AI companies are increasing their spending on data year by year. However, this expenditure has not been returned to the true contributors of the data, as platforms have entirely enjoyed the value creation brought by the data, such as Reddit generating a total revenue of $203 million through data licensing agreements with AI companies.

The vision of Web3 is to allow users who truly contribute to also participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms.

Grass is a decentralized data layer and network that allows users to run Grass nodes to contribute idle bandwidth and relay traffic in order to capture real-time data from across the internet and earn token rewards.
Vana introduces a unique Data Liquidity Pool (DLP) concept, allowing users to upload private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use it.
In PublicAI, users can use #AI或#Web3 as a classification tag on X and @PublicAI to achieve data collection.

Data Preprocessing: During the AI data processing, the collected data is often noisy and contains errors, requiring cleaning and conversion into a usable format before training the model. This involves standardization, filtering, and handling missing values, which are repetitive tasks. This stage is one of the few manual processes in the AI industry, leading to the emergence of the data labeling profession. As the model's requirements for data quality increase, the threshold for data labelers has also risen, making this task inherently suitable for the decentralized incentive mechanism of Web3.

Grass and OpenLayer are both considering incorporating data annotation as a key component.
Synesis proposed the "Train2earn" concept, emphasizing data quality, where users can earn rewards by providing annotated data, comments, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technologies and their potential application scenarios are reflected in two aspects: (1) sensitive data training; (2) data collaboration: multiple data owners can participate in AI training together without sharing the original data.

The current common privacy technologies in Web3 include:

Trusted Execution Environment ( TEE ), such as Super Protocol.
Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io, or Inco Network.
Zero-knowledge technology (zk), such as the Reclaim Protocol that uses zkTLS technology, generates zero-knowledge proofs for HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still in exploration. The current dilemma is that the computing costs are too high, for example:

The zkML framework EZKL takes about 80 minutes to generate a proof for the 1M-nanoGPT model.
According to Modulus Labs data, the overhead of zkML is more than 1000 times higher than pure computation.

Data Storage: Once the data is available, a place is needed to store the data on-chain, as well as the LLM generated using that data. With data availability (DA) as the core issue, before the Ethereum Danksharding upgrade, its throughput was 0.08MB. Meanwhile, AI model training and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference makes existing on-chain solutions struggle when facing "resource-intensive AI applications."

0g.AI is a representative project in this category. It is a centralized storage solution designed for high-performance AI needs, with key features including: high performance and scalability, supporting fast upload and download of large-scale datasets through advanced sharding and erasure coding technologies, with data transfer speeds nearing 5GB per second.

Middleware: Model Training and Inference

Open Source Model Decentralized Market

The debate over whether AI models should be closed source or open source has never disappeared. The collective innovation brought about by open source is an unmatched advantage that closed source models cannot compete with. However, without a profit model, how can open source models enhance developers' motivation? This is a direction worth considering. Baidu founder Robin Li asserted in April of this year that "open source models will increasingly fall behind."

In response, Web3 proposes the possibility of a decentralized open-source model market, which involves tokenizing the model itself, reserving a certain proportion of tokens for the team, and directing a portion of the model's future income flow to token holders.

The Bittensor protocol establishes a P2P market for open-source models, consisting of dozens of "subnets" where resource providers (computing, data collection/storage, machine learning talent) compete with each other to meet the goals of specific subnet owners. The subnets can interact and learn from each other to achieve more powerful intelligence. Rewards are distributed by community voting and further allocated based on competitive performance.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

12 Likes