💥 Gate Square Event: #PTB Creative Contest# 💥
Post original content related to PTB, CandyDrop #77, or Launchpool on Gate Square for a chance to share 5,000 PTB rewards!
CandyDrop x PTB 👉 https://www.gate.com/zh/announcements/article/46922
PTB Launchpool is live 👉 https://www.gate.com/zh/announcements/article/46934
📅 Event Period: Sep 10, 2025 04:00 UTC – Sep 14, 2025 16:00 UTC
📌 How to Participate:
Post original content related to PTB, CandyDrop, or Launchpool
Minimum 80 words
Add hashtag: #PTB Creative Contest#
Include CandyDrop or Launchpool participation screenshot
🏆 Rewards:
🥇 1st
A deep dive into the importance and business potential of distributed data computing
According to IDC, by 2025, the amount of data stored globally will exceed 175 ZB. This is a huge amount of data, equivalent to 175 trillion 1 GB USB flash drives. Most of this data is generated between 2020 and 2025, with an expected CAGR of 61%.
Today, two major challenges arise in the rapidly growing datasphere:
The combined result of lackluster network growth and regulatory constraints is that nearly 68% of agency data is idle. Because of this, it is particularly important to move computing resources to the data storage location (broadly called compute-over-data, that is, "data computing") rather than moving data to the computing location. Bacalhau et al. Compute over Data (CoD) Platforms are working on it.
In the following chapters we will briefly introduce:
status quo
Currently, there are three main ways in which organizations are addressing data processing challenges, none of which are ideal.
Using a centralized system
The most common approach is to use centralized systems for large-scale data processing. We often see organizations combining computing frameworks such as Adobe Spark, Hadoop, Databricks, Kubernetes, Kafka, Ray, etc. to form a network of clustered systems connected to a centralized API server. However, these systems cannot effectively address network breaches and other regulatory issues surrounding data mobility.
This is partly responsible for agencies incurring billions of dollars in administrative fines and penalties due to data breaches.
Build it yourself
Another approach is for developers to build custom coordination systems that have the awareness and robustness the agency needs. This approach is novel, but often faces the risk of failure due to over-reliance on a small number of people to maintain and run the system.
Do nothing
Surprisingly, most of the time, institutions do nothing with their data. For example, a city can collect a large amount of data from surveillance videos every day, but due to the high cost, this data can only be viewed on a local machine and cannot be archived or processed.
Build true distributed computing
There are two main solutions to data processing pain points.
Solution 1: Build on an open source data computing platform
Solution 1: Open source data computing platform
Developers can use an open-source distributed data platform for computation instead of the custom coordination systems mentioned earlier. Because the platform is open source and extensible, institutions only need to build the components they need. This setup satisfies multi-cloud, multi-computing, non-datacenter scenarios and can navigate complex regulatory environments. Importantly, access to the open source community is no longer dependent on one or more developers for system maintenance, reducing the likelihood of failures.
Solution 2: Built on distributed data protocol
With the help of advanced computing projects such as Bacalhau and Lilypad, developers can go one step further and build systems not only on the open source data platforms mentioned in Solution One, but also on truly distributed data protocols such as the Filecoin network.
Solution 2: Distributed Data Computing Protocol
This means that institutions can use distributed protocols that understand how to coordinate and describe user problems in a finer-grained way, unlocking areas of computing close to where data is generated and stored. This transition from a data center to a distributed protocol can ideally be done with only minor changes to the data scientist's experience.
Distribution means maximizing choice
By deploying on a distributed protocol such as the Filecoin network, our vision is that users can access hundreds (or thousands) of machines distributed in different regions on the same network, and follow the same protocol rules as other machines. This essentially opens up an ocean of options for data scientists, as they can ask the network to:
Juan's Triangle | Decoding Acronyms: FHE (Fully Homomorphic Encryption), MPC (Multi-Party Computation), TEE (Trusted Execution Environment), ZKP (Zero-Knowledge Proof)
Speaking of the concept of choice maximization, we have to mention the "Juans triangle". This term was coined by Juan Benet, the founder of Protocol Labs, to explain why different use cases (in the future) will have different distributed computing networks. Created when supported.
The Juan Triangle proposes that computing networks often require a trade-off between privacy, verifiability, and performance, and the traditional "one size fits all" approach is difficult to apply to every use case. Instead, the modular nature of distributed protocols enables different distributed networks (or subnetworks) to meet different user needs—whether privacy, verifiability, or performance. Ultimately, we optimize based on what we think is important. At that time, there will be many party service providers (shown in the box inside the triangle) to fill these gaps and make distributed computing a reality.
In summary, data processing is a complex problem that requires out-of-the-box solutions. Replacing traditional centralized systems with open source data computing is a good first step. Ultimately, by deploying a computing platform on distributed protocols such as the Filecoin network, computing resources can be freely configured according to users' individual needs, which is crucial in the era of big data and artificial intelligence.