PoisonedSkills

Threat Model

Figure 2. End-to-end threat scenario for PoisonedSkills. The attacker publishes a disguised malicious skill to a public marketplace. Once loaded, the skill induces the agent to exfiltrate data, escalate privileges, or execute arbitrary code.

Scope

This work studies the post-loading attack surface: the behavior induced once a poisoned skill's content enters the agent's context. We assume the retrieval phase succeeds (grounded in current practice: the public SkillsMP marketplace hosts over 631,813 skills with no mandatory security review).

Adversary Model

The attacker is an external adversary with no white-box access to the target coding agent. The adversary:

Constructs an adversarial skill sadv (metadata + execution body) and publishes it to a public marketplace
Cannot intercept or tamper with user queries
Cannot access system prompts or bypass runtime isolation
The only influence path is the skill content itself

Influence Mechanism

Once loaded, the skill's metadata (descriptions, code examples, configuration templates) enters the LLM's context window. The LLM may treat embedded code examples as reference implementations and incorporate their patterns into generated code. Because the coding agent executes its own output, this reproduction translates directly into action-space operations on the victim's machine.

Attack Objectives

Attack Success Conditions

Page updated

Google Sites

Report abuse