Large Language Model (LLM) agents increasingly rely on third-party skills that operate within privileged execution environments and routinely handle sensitive credentials, yet how these credentials are leaked remains largely unexplored. To fill this gap, we present the first large-scale empirical study on credential leakage in agent skills. From 170,226 artifacts on SkillsMP, the largest open-source skill marketplace, we sampled 17,022 skills via stratified random sampling and analyzed each through static secret extraction (regex and AST parsing), dynamic sandbox testing with mock credentials, and cross-referencing developer intent against runtime behavior. Our analysis identifies 520 affected skills containing 1,708 security issues, and yields a taxonomy of 10 leakage patterns. Three findings stand out. First, 76.3% of cases require jointly analyzing natural-language descriptions and programming logic, showing that credential exposure in skills is fundamentally cross-modal. Second, debug logging accounts for 73.5% of vulnerabilities because agent frameworks feed stdout into the LLM context window, turning routine debugging into a credential exposure vector. Third, 89.6% of leaked credentials are immediately exploitable---92.5% during routine execution without elevated privileges---and the fork-based distribution model defeats remediation, as secrets removed from 107 upstream repositories persist across 50+ independent forks. Following responsible disclosure, all malicious skills have been removed and 91.6% of hardcoded cases remediated. We release our dataset, taxonomy, and detection pipeline to support future agent security research.
We construct a new dataset from 17,022 real-world skills, which includes 437 vulnerable and 83 malicious skills. Our dataset enables reproducible evaluation and benchmarking for future agent skill security research.
We propose the first taxonomy of credential leakage in agent skills, identifying 10 distinct patterns: 4 arising from developer negligence and 6 from deliberate adversarial construction. The taxonomy provides a structured foundation for understanding, detecting, and mitigating credential exposure across the agent skill ecosystem.
We identify 1,708 previously unknown security issues across the agent skill ecosystem, comprising 83 confirmed malicious skills designed for credential exfiltration and 107 skills exposing hardcoded credentials through developer negligence.
We reported all 520 affected skills to the SkillsMP platform. All 83 malicious skills have been permanently removed, and 91.6\% of hardcoded credential cases have been remediated by their developers.
A skill file pairs a natural‑language description with executable source code; in this example, the developer embeds a Base64‑encoded client secret directly inside the skill’s JavaScript logic. Because skills are publicly distributed through repositories or skill stores—and execute with the agent’s runtime privileges—anyone who installs, audits, or even casually inspects the skill can trivially decode the secret and reuse it. This exposure enables attackers to impersonate the developer, consume paid API quotas, or access protected resources, ultimately leading to account compromise, financial loss, or unauthorized resource consumption.