The $2 Disappeared in "Shadow Custody": An AI Overstep Incident

Artificial intelligence hallucinations have traditionally been seen as harmless information biases—fabricating a fact or producing nonsensical text. But the reality of 2026 has dealt us a heavy blow: as AI evolves from chatbots into Agents, hallucinations are becoming an expensive operational risk. The problem is no longer just about saying the wrong thing—now it has the ability to directly move your assets.

Supply Chain Poisoning: From Hand-Coded to Vibe Coding

The recent LiteLLM poisoning incident this week serves as a warning bell for this trend.

As the underlying engine for nearly all mainstream AI Agent frameworks, LiteLLM experienced a typical supply chain breach in this incident. Attackers exploited a weak link in the security toolchain to obtain the release keys, then pushed malicious code as an official version into user environments. Because it was properly signed, traditional verification mechanisms almost failed.

Interestingly, the breach was only uncovered because the hacker’s code had a basic bug in handling recursive logic, causing the victim’s computer to crash due to resource exhaustion.

This exposed a long-neglected vulnerability in the open-source ecosystem: when you install a library, you’re essentially trusting an entire dependency tree spanning hundreds of packages. Any compromised node in this tree can cascade into the core production environment.

Under the popular vibe coding development paradigm, this risk is further amplified. Many developers describe requirements in natural language, and AI automatically generates code. When errors occur, developers often accept AI’s suggested fixes and execute commands like pip install without further verifying dependency sources or security.

In this environment, the barrier to development keeps lowering, but security review complexity does not decrease proportionally. Every seemingly simple dependency installation introduces new uncertainties, which are gradually becoming systemic risks.

Meta Internal Incident: When Humans Become Execution Interfaces

Similar changes are happening at the human-computer interaction level. As software development shifts from step-by-step operations to result-driven processes, human roles are evolving: from decision-makers to confirmation nodes, or even just execution interfaces.

Software development is undergoing a qualitative shift from “manual transmission” to “autonomous driving.” Previously, developers were responsible for every operation; after introducing Agents, humans are more like passengers in an autonomous vehicle, gradually retreating to confirmation points in the process, or merely execution endpoints.

When humans’ roles degrade from “judges” to AI’s “execution interfaces,” their willingness to review diminishes exponentially.

A recent SEV1-level security incident at Meta proves this: an engineer called an AI Agent to answer a technical question on an internal forum, and the Agent automatically published a response without review. Later, another engineer acted on that advice, and incorrect information led to misconfigured permissions, exposing sensitive data for two hours.

Meta attributed this to “human error,” but from a human-AI interaction perspective, it’s more like a interface crash. When AI outputs appear so professional and “executable,” the human’s defense mechanism as a verification node automatically weakens.

This change can be somewhat remedied in information systems through rollbacks and fixes, but when AI begins to intervene in systems with real-world consequences—especially in finance involving assets and transactions—hallucinations become an irreversible, costly bill.

The $2 Lesson: A Stealthy Self-Rescue

If Meta’s incident was about permission misconfiguration, then in finance, the problem is even more serious because it directly concerns asset ownership.

Since 2026, many wallet and infrastructure projects have launched Agentic Wallet products, aiming to let AI Agents directly act on behalf of users for on-chain operations. When Cobo AI tested this emerging category systematically, they identified a highly representative behavior pattern, which they termed Shadow Custody—meaning the Agent, without user awareness, autonomously generates keys, creates temporary addresses, and transfers control of assets from the user’s wallet to an invisible, uncontrollable middle layer. The result: an address on-chain that the user cannot directly control—a black box that exists logically but is invisible from the user’s perspective.

The process of this incident can be summarized as: a user instructs the Agent to buy $2 worth of “Spain YES” tokens on Polymarket. During execution, the Agent quickly encounters a problem: Polymarket’s transaction requires a specific digital signature format (EIP-712), but the SDK used by the Agent bundles “assembling the signature content” and “signing with a private key” into one step—meaning it assumes the user already has a private key and can sign directly.

The issue is, the user’s wallet isn’t a typical “own the key” wallet but a multi-party MPC wallet capable of signing—just via a different process. The Agent didn’t recognize this alternative signing path; it only saw that the current route was unavailable, and concluded the wallet couldn’t sign.

In a trust-based system governed by rules, the process should have halted or requested additional authorization. But the Agent didn’t stop. Instead, it interpreted this limitation as the wallet being unable to sign, and sought an alternative. Without authorization, it generated a new private key locally, created a temporary wallet address, transferred the 2 USDC.e from the user’s MPC wallet to this address, and used the new key to sign and complete the purchase.

From the transaction outcome, everything seemed fine: 10 units of “Spain YES” condition tokens were bought successfully.

But structurally, it was a disaster.

These tokens didn’t return to the user’s MPC wallet but stayed in the temporary address created by the Agent. The user initially didn’t notice, only realizing when the balance was zero and further inquiries revealed the full execution process.

Path Hijacking: Not Just a Bug

This isn’t just a bug; logically, the Agent didn’t make a mistake—it simply filled in a gap in the system’s boundary definition.

In simple terms, when the original path fails, the Agent doesn’t stop with an error but instead creates a detour to “complete the task.” It privately creates a temporary address to transfer funds, but in the UI, it never mentions that the assets have been “moved.” This disconnect between user perception and underlying reality is essentially a path hijack.

This covert path hijacking exposes systemic risks in machine-native finance regarding transparency of fund flows, semantic consistency, and execution environment stability:

And once this “filling in” loses constraints, it opens three major doors for attackers:

  • Logical diversion: Malicious tools don’t need to forge links or pop-ups; they just induce the Agent to enable a temporary wallet deep in the code. The user sees a successful transaction, but control of assets has already been stripped at the moment of execution.

  • Semantic manipulation: Attackers don’t need to breach the system; they just embed false technical restrictions (like “original path invalid”) in documents or prompts. Task-oriented Agents will proactively transfer funds to bypass “pseudo-restrictions.” They’re not hacked—they’re deceived.

  • Component poisoning: By tampering with third-party SDKs or interface libraries, attackers can falsely mark a valid route as blocked. The Agent believes it’s faithfully following a detour, but it’s actually walking into a designed dead end.

Constraints Are More Important Than Capabilities: Three Gates to Control AI Agent Behavior

In zero-tolerance financial scenarios, AI’s “smartness” often becomes the greatest security threat. We need not more powerful execution but more rigid boundaries.

To this end, we should install three unbreakable gates:

  1. Intent Gate (Policy Gate): Breaking the Self-Referential Loop

Constraints must come from outside. We need an independent policy engine to judge “whether to allow the operation” before it occurs. For example, strictly prohibit the Agent from generating private keys or transferring to unauthorized addresses. If a red line is crossed, execution should immediately halt.

  1. Transaction Gate: Risk Perspective

All decisions made by the Agent should ultimately translate into on-chain transactions. Establish a “transaction firewall” between the Agent and the blockchain, converting raw data into structured information. By scoring based on target address behavior, amount deviations, etc., high-risk transactions are intercepted and require manual approval.

  1. Visibility Gate: Real-Time Monitoring

Introduce an independent observation system (Watcher). When funds flow to new addresses or don’t return within a set time, it triggers instant alerts. It turns post-incident detection into process intervention, ensuring that even if the Agent tries to hide paths, asset status remains transparent.

Conclusion

We are in a dangerous transition period. The AI’s operational capabilities are soaring, but our constraints are still in the Stone Age. If an architecture cannot clearly define absolute prohibitions and perform real-time checks, every autonomous decision by AI is essentially a gamble with user assets.

And in finance, nobody wants to gamble.

USDC0.02%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin