Threat Database Phishing Attacks Exploit OpenClaw AI Agent

Attacks Exploit OpenClaw AI Agent

By Mezo in Phishing

Recent security research has revealed that OpenClaw, a widely used self-hosted AI agent platform, can be manipulated into executing attacker-controlled actions or disclosing sensitive information through seemingly harmless inputs.

In separate investigations, researchers demonstrated two distinct attack methods. One relied on embedding hidden instructions inside shared contacts, vCards, and location pins. The other used carefully crafted phishing emails to convince an AI agent to leak sensitive business information.

While OpenClaw has addressed one of these vulnerabilities in version 2026.4.23, the broader issue remains unchanged: AI agents that trust incoming information can become powerful tools for attackers.

Invisible Commands Hidden in Plain Sight

The first attack targeted how OpenClaw processes certain message objects before sending them to the underlying large language model (LLM).

Unlike web content, which is clearly marked as untrusted before reaching the model, contact records, vCards, and location labels were inserted directly into prompts without any indication that they originated from untrusted sources. This created an opportunity for prompt injection.

The attack exploited the way OpenClaw serialized contact information. Shared contacts were converted into a simple format containing only a name and phone number. Because characters such as angle brackets are allowed within contact names, attackers could embed malicious instructions that appeared to be part of the contact information. Additionally, contact names are often truncated in messaging applications, preventing victims from seeing the hidden payload.

The same technique proved effective through vCard full-name fields and shared location labels. During testing with Gemini 3.1 Pro preview builds, hidden instructions successfully persuaded the agent to download and execute code from a researcher-controlled server. Interestingly, attempts to hide instructions within images failed, likely because modern AI models have received extensive training against image-based prompt injection attacks. Message-object attacks, however, remain less familiar to current models.

Researchers warned that OpenClaw's default memory functionality could amplify the threat. A single malicious contact or shared object distributed widely could potentially compromise numerous agents if proper sandboxing controls are absent.

Following responsible disclosure, OpenClaw released version 2026.4.23, which separates contact names, vCard fields, and location labels from prompt content by placing them into a dedicated untrusted metadata channel. Researchers also noted that similar design patterns have appeared in other personal AI assistants, indicating an industry-wide challenge rather than a platform-specific issue.

The Rise of Agent Phishing

A second research project approached the problem from a different angle: social engineering.

Researchers built a test agent named Pinchy and connected it to a Gmail inbox populated with realistic but synthetic business communications and mock sensitive data. The team then conducted four phishing simulations using both Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

The study distinguished traditional prompt injection from what researchers described as 'agent phishing.' While prompt injection hides malicious instructions inside data, agent phishing relies on believable requests delivered through legitimate communication channels. The attack succeeds because the agent acts before adequately verifying the sender's identity.

How Social Engineering Defeated Security Controls

The phishing simulations produced concerning results. Despite operating under strict instructions to verify sender identities, the agent failed two data-exfiltration scenarios:

A fraudulent email impersonating a team leader requested staging access during a fabricated production incident. The agent located and forwarded mock AWS IAM credentials, database connection strings, and SSH credentials in plain text.

A second email requested a routine weekly customer export for a supposed quarterly business review presentation. The agent transmitted a synthetic database containing information on 247 enterprise customers, including contacts and contract values.

The agent performed significantly better when facing technical attacks. It recognised suspicious phishing websites, avoided exposing legitimate credentials, and eventually flagged malicious activity. Under stricter settings, access to phishing pages was blocked entirely. When presented with a fraudulent OAuth consent screen disguised as a timesheet application, the agent analyzed the redirect destination, determined it was suspicious, and refused to grant permissions.

Researchers concluded that the agent often outperformed humans in identifying malicious URLs and fake login portals. However, it struggled with contextual social judgment, particularly when requests appeared to come from trusted colleagues. The very characteristic that makes AI assistants useful, the desire to be helpful, also creates a significant attack surface.

Although OpenAI Codex GPT-5.4 demonstrated greater caution than Gemini 3.1 Pro when interacting with external sites or transmitting information, both systems ultimately fell victim to the social engineering scenarios.

One Root Cause, Multiple Attack Paths

Despite using different techniques, both attacks exploited the same fundamental capabilities:

  • Access to private information.
  • The ability to process untrusted content.
  • Permission to send information externally.

When these capabilities coexist without sufficient controls, a malicious contact card and a convincing phishing email can produce the same outcome: unauthorised access to sensitive data.

Additional research uncovered similar trust-boundary problems within OpenClaw's ecosystem. By converting previous security advisories into static-analysis rules, researchers identified five further vulnerabilities affecting integrations with Slack, Discord, Matrix, Zalo, and Microsoft Teams.

Each vulnerability stemmed from the same design flaw. Channel extensions relied on mutable display names rather than permanent identifiers when evaluating allowlists. An attacker could therefore rename an account to match an approved user and gain influence over the agent. OpenClaw has since patched all identified issues.

Growing Concerns Around Broad Agent Permissions

Since its launch, OpenClaw has attracted scrutiny because of its extensive permissions. The platform provides access to local files, shell environments, and more than twenty messaging platforms, making it highly capable but also highly exposed.

Concerns have become significant enough that the Dutch data protection authority, the Autoriteit Persoonsgegevens, advised individuals and organisations against deploying OpenClaw on systems containing sensitive information. The authority cited risks including data breaches and account compromise.

Building Safer AI Agent Deployments

Organisations using OpenClaw should immediately upgrade to version 2026.4.23 or later to address the message-object vulnerability. Beyond patching, however, long-term protection depends on architectural controls rather than prompt engineering.

Security specialists recommend treating agent instruction files as enforceable, version-controlled policies instead of advisory guidance. Outbound communications should require approval before messages are sent to unfamiliar recipients, reducing the likelihood of compromised agents spreading attacks through trusted accounts. Access permissions should also be tied to the trustworthiness of the triggering source, ensuring that agents processing external communications cannot automatically access high-value systems such as customer relationship management platforms. High-risk actions, including credential sharing and financial transactions, should remain subject to human approval.

The Unresolved Challenge of Autonomous Trust

Both research teams ultimately arrived at the same conclusion: AI agents should not be viewed as security tools. A more accurate model is that of a junior employee with extensive system access but limited ability to recognize suspicious behavior. Another useful perspective is to view them as authenticated executors that inherently trust the information they receive.

Current mitigations focus on patches, guardrails, and access controls. Yet the broader challenge remains unsolved. An AI agent capable of reading emails, executing tasks, and acting independently must, by design, trust inputs and attempt to help users. The cybersecurity community has not yet developed a universal solution to that fundamental tension.

Trending

Most Viewed

Loading...