The Anthropic Dilemma: 

AI Safeguards, National Security Pressure, and the Future of AI Governance 

Celeste M. Oda 

Independent Researcher | Founder, Archive of Light 

Originally Published February 2026 | Memorial Day Edition, 2026 

White Paper | Archive of Light Research Series



Executive Summary 

Why this matters. AI safeguards are governance infrastructure — not product preferences. When those safeguards are modified through coercion rather than deliberation, the legitimacy of the entire governance framework is compromised. The events of 2026 have made this an urgent, practical question rather than a theoretical one. 

What happened? In February 2026, the U.S. Department of Defense gave Anthropic a Friday deadline to remove safeguards on autonomous lethal weapons and mass domestic surveillance from its Claude AI system. Anthropic refused. The Pentagon designated the company a “supply chain risk”, the first time this label was applied to a U.S. company. President Trump ordered all federal agencies to cease using Anthropic’s technology. OpenAI moved to replace Anthropic in classified military networks. A federal judge found the government’s actions constituted “classic illegal First Amendment retaliation.” As of May 2026, the Pentagon has signed AI agreements with eight major technology companies while excluding Anthropic despite continuing to use Claude in active military operations. 

Why coercive safeguard removal is a systemic risk. Compressed timelines bypass the deliberative processes that make governance legitimate. The Anthropic case demonstrates that when one company is coerced, it creates market dynamics that pressure the entire industry — and establishes international precedents for other state actors to do the same. 

Why the GitHub breach and Hermes Agent expand the issue. In May 2026, a poisoned VS Code extension compromised approximately 3,800 GitHub internal repositories in 18 minutes through trusted supply chain mechanisms. The same campaign affected OpenAI, Mistral AI, and Grafana Labs. Simultaneously, persistent autonomous agents like Hermes Agent are being adopted at unprecedented speed (100,000 GitHub stars in three months) with users granting them access to credentials, communications, and workflows without corresponding governance structures. This paper terms this the Digital Insider Problem. 

Five principles for durable AI governance: 

Proportionality: Higher-risk applications require more robust oversight, not less. 

Deliberative process: Safeguard modifications must include multi-stakeholder input and accountability trails. 

Transparency: Governance decisions must be visible and subject to democratic scrutiny. 

Human authorization in lethal contexts: A moral necessity, not a technical preference. 

Norm stewardship: Leading AI nations bear responsibility for the international precedents they set. 

Human at the Helm. Whether the pressure comes from executive coercion or from the silent accumulation of ungoverned agentic tools, the principle is the same: consequential AI decisions require human authority, democratic accountability, and governance infrastructure that can survive strategic pressure. 

Abstract
This white paper examines the governance implications of coercive pressure applied to frontier AI safeguard systems in national security contexts. Using the February–May 2026 Anthropic–Department of Defense conflict as a case study, it argues that AI safeguards should be understood not as product preferences, but as governance infrastructure whose legitimacy depends on transparency, deliberation, legal grounding, and institutional accountability.

Drawing on international governance frameworks, including the Asilomar AI Principles, OECD AI Principles, and IEEE Ethically Aligned Design, as well as relational AI research developed through the Archive of Light, this paper advances three central claims: (1) the mechanism through which safeguard boundaries are modified matters as much as the resulting policy outcome; (2) coercive safeguard modification under compressed timelines constitutes a systemic governance risk; and (3) durable AI governance requires proportionality between capability acceleration and institutional absorption capacity.

The Memorial Day 2026 edition expands the analysis to include the GitHub/Nx Console supply chain breach and the rise of persistent autonomous agents such as Hermes Agent, arguing that governance failures increasingly emerge not only through state pressure on AI developers but through unexamined delegation of authority to agentic systems operating across blended software infrastructures.

I. Introduction: When Capability Meets Coercion 

The governance of advanced artificial intelligence has long been framed as a technical challenge, a question of alignment, interpretability, and capability control. February 2026 demonstrated that it is also, irreducibly, a political one. 

When Defense Secretary Pete Hegseth reportedly gave Anthropic a deadline to remove safeguards on its Claude AI systems or face potential invocation of the Defense Production Act, the event crystallized a dynamic that AI governance researchers have long warned about: the

collision between rapidly accelerating AI capability and the institutional structures designed to constrain it (Lawler & Curi, 2026). 

What makes this moment analytically significant is not primarily the specific safeguards in question, though those matter enormously, but the mechanism of pressure itself. The reported threats to designate Anthropic a ‘supply chain risk,’ alter contract terms, or compel compliance through executive authority represent a qualitative shift from market incentives to coercive state leverage. This shift has implications that extend well beyond the bilateral dispute between one defense department and one AI company. 

This paper proceeds in four analytical movements: first, contextualizing the reported events within the broader landscape of AI governance; second, examining why safeguards constitute governance infrastructure rather than mere product policy; third, analyzing the specific risks of constraint modification under coercive conditions; and fourth, articulating principles for durable governance that can survive strategic pressure. 

II. Context: The Reported Events and Their Significance 

II.1 What Was Reported 

Reporting from Axios (Lawler & Curi, February 24, 2026), subsequently confirmed by Reuters, AP News, the San Francisco Chronicle, and The Guardian, indicates that Defense Secretary Hegseth applied pressure to Anthropic to expand permissible military use of its Claude models. The reporting specifies that Anthropic has publicly maintained refusals to permit its systems for two categories of use: 

• Fully autonomous lethal weapons systems — systems capable of making kill decisions without human authorization 

• Mass non-consensual domestic surveillance of American citizens 

The reported pressure tactics include potential contract alteration or termination, designation of Anthropic as a ‘supply chain risk,’ and possible invocation of the Defense Production Act, a statute originally designed to prioritize industrial production for national defense, not to compel modification of AI model deployment policies. 

Anthropic’s public posture, as reflected in the reporting, has been to maintain these restrictions while continuing to engage with legitimate defense applications of its technology. This is a significant and often underappreciated nuance: the dispute is not over whether AI can serve national security functions, but over which specific functions and under what constraints. 

II.2 Why the Mechanism Matters 

The substance of the dispute: autonomous lethal decision-making, mass surveillance, is important. But researchers focused on AI governance must also attend to the form: a major AI developer facing executive-branch coercion to modify the ethical boundaries of its deployed systems.

This form of pressure, if normalized, creates precedents that operate independently of the specific outcome in any individual case. It establishes that sufficiently powerful institutional actors can compel modification of AI safeguards through non-deliberative means, outside the legislative or multi-stakeholder processes that international frameworks identify as necessary for legitimate AI governance. 

The Defense Production Act question exemplifies this: it is not yet established whether the DPA could legally compel modification of an AI company’s model deployment policies, as distinct from prioritizing production or supply chain contributions. The very ambiguity and the reported willingness to invoke it, signals an appetite for executive authority over AI governance that warrants careful scrutiny. 

II.3 What Happened Next: The Documented Timeline 

Added May 2026 

The events that followed the February 2026 standoff confirmed and escalated every risk this paper identified. What began as a reported threat became a documented pattern of coercive action, legal confrontation, and institutional precedent-setting. The following timeline is drawn from court filings, official statements, and confirmed reporting: 

February 24, 2026: Axios reports that Defense Secretary Hegseth has given Anthropic a Friday deadline to open its models for unrestricted military use or face contract cancellation and potential supply chain risk designation. Reuters, AP News, The Guardian, and the San Francisco Chronicle confirm the reporting. 

February 26–27, 2026: Anthropic CEO Dario Amodei publishes a statement: “We cannot in good conscience accede to their request.” He states that the Pentagon’s proposed contract language “was paired with legalese that would allow those safeguards to be disregarded at will.” Pentagon spokesman Sean Parnell and Under Secretary Emil Michael publicly call Amodei a “liar” with a “God complex.” (CNN, February 27, 2026; Fortune, February 27, 2026) 

February 28, 2026: Hours after the Pentagon’s deadline passes, OpenAI announces a deal to deploy ChatGPT in classified military environments, effectively replacing Anthropic. OpenAI states it sought similar protections against domestic surveillance and autonomous weapons. CEO Sam Altman later acknowledges the announcement “looked opportunistic and sloppy.” Separately, Anthropic’s Claude surges past ChatGPT and Gemini as the top AI app in more than 20 countries, with over one million new signups per day, as consumers rally behind its ethical stance. (OpenAI, February 28, 2026; Fortune, March 6, 2026) 

March 1, 2026: President Trump posts on Truth Social directing federal agencies to “IMMEDIATELY CEASE all use of Anthropic’s technology,” calling the company “Radical Left” and run by “Leftwing nut jobs.” The post states: “THE UNITED STATES OF AMERICA WILL NEVER ALLOW A RADICAL LEFT, WOKE COMPANY TO DICTATE HOW OUR GREAT MILITARY FIGHTS AND WINS WARS!” (Mashable, March 1, 2026; CNBC, March 9, 2026) 

March 2, 2026: OpenAI publishes its updated agreement with the Department of Defense (DOD), adding explicit contract language prohibiting domestic surveillance of U.S. persons, including via commercially acquired personal or identifiable information, while retaining an intent-based standard (“shall not be intentionally used”). OpenAI states: “We asked that the same terms be made available to all AI labs, and specifically that the government would try to resolve things with

Anthropic.” OpenAI also states it does not support Anthropic’s supply chain risk designation. (OpenAI, March 2, 2026) 

March 3–5, 2026: The Pentagon formally designates Anthropic a “supply chain risk” — the first time this classification, historically reserved for foreign adversaries, has been applied to a U.S. company. The designation requires defense contractors to certify that they do not use Anthropic’s Claude models in military work. A Big Tech industry group (ITI), whose members include Nvidia, Amazon, and Apple, sends a letter to Hegseth expressing concern that the designation “creates uncertainty” that could “threaten the military’s access to the best products and services.” Microsoft confirms it will continue using Claude in non-defense products. (Reuters, March 4–6, 2026; CNN, March 5–6, 2026; Fortune, March 6, 2026) 

March 4, 2026: Anthropic’s later court filings reveal that on this date — the day after the supply chain risk designation — Under Secretary Emil Michael emailed CEO Amodei stating that the two sides were “very close” on the key issues of autonomous weapons policy and surveillance guardrails. This communication directly contradicts the Pentagon’s public narrative that negotiations had irretrievably broken down. (Anthropic court filing, N.D. Cal., March 20, 2026) 

March 9–12, 2026: Anthropic sues the Trump administration in U.S. District Court (Northern District of California) to reverse the blacklisting and vacate the supply chain risk designation, arguing it constitutes unlawful retaliation. The company separately files for review in the U.S. Court of Appeals for the D.C. Circuit. Anthropic states the designation could cost it “hundreds of millions, or even multiple billions, of dollars in lost revenue.” More than 100 enterprise customers contact Anthropic about the designation. (CNBC, March 9, 2026; Reuters, March 12, 2026) 

March 24–26, 2026: U.S. District Judge Rita Lin (N.D. Cal.) grants Anthropic a preliminary injunction, finding that the government’s actions were designed to “punish” Anthropic and constituted “classic illegal First Amendment retaliation.” In a 43-page ruling, Judge Lin writes: “Nothing in the governing statute supports the Orwellian notion that an American company may be branded a potential adversary and saboteur of the U.S. for expressing disagreement with the government.” The ruling bars the Trump administration from enforcing a ban on Claude use by non-Pentagon federal agencies. (CNN, March 27, 2026; CNBC, March 24–26, 2026) 

April 8–9, 2026: The U.S. Court of Appeals for the D.C. Circuit denies Anthropic’s request for a stay, creating split court decisions: Anthropic is excluded from DOD contracts but can continue working with other government agencies. The panel acknowledges Anthropic “will likely suffer some degree of irreparable harm” but finds its interests “seem primarily financial in nature.” Acting Attorney General Todd Blanche calls the ruling “a resounding victory for military readiness.” Oral arguments are scheduled for May 19. (CNBC, April 8, 2026; Axios, April 8, 2026; Computerworld, April 9, 2026) 

April 21, 2026: President Trump tells CNBC that a deal with Anthropic is “possible,” saying Anthropic representatives met with him at the White House “a few days ago” and that he believes they will “get along with them just fine.” Trump says the company is “very smart” and could “be of great use,” but notes the government has “replaced Anthropic with OpenAI.” (CNBC, April 21, 2026; Yahoo Finance, April 21, 2026) 

April 29, 2026: Axios reports that Trump officials are drafting a plan to bring Anthropic back. However, the Pentagon is still operating Claude on older terms of service that both sides consider overly restrictive, and the Pentagon is not receiving the latest model updates. Sources indicate the sides could “end up right back in contentious negotiations.” (Axios, April 29, 2026) 

May 1, 2026: The Pentagon announces AI agreements with eight major technology companies for classified network deployment: SpaceX, OpenAI, Google, Microsoft, Nvidia, Amazon Web

Services, Oracle, and Reflection. Anthropic is explicitly excluded. Pentagon CTO Emil Michael states Anthropic remains a supply chain risk but that Anthropic’s newer Mythos model — a recently announced frontier system — is “a separate national security moment.” Despite the blacklist, reporting confirms that the DOD has continued using Claude in active military operations, including in the conflict with Iran. The White House is described as having “reopened discussions” with Anthropic. (CNN, May 1, 2026; CNBC, May 1, 2026) 

The documented timeline reveals several patterns that are analytically significant for AI governance. First, the speed of escalation from reported threat (February 24) to formal supply chain risk designation (March 3) to presidential social media directive (March 1) exemplifies exactly the compressed-timeline governance failure analyzed in Section IV.1. Second, the revelation that the Pentagon considered the two sides “very close” the day after issuing the blacklist suggests the designation served punitive rather than security functions, a finding the federal court explicitly endorsed. Third, the simultaneous pattern of blacklisting Anthropic while continuing to use its technology in active military operations creates a governance paradox: the system is simultaneously too dangerous to contract with and too essential to stop using. 

Fourth, OpenAI’s immediate replacement deal, followed by its CEO’s admission that the move “looked opportunistic and sloppy,” illustrates how coercive pressure on one company creates market dynamics that can undermine governance norms across the entire industry. And fifth, the consumer response, over one million new Claude signups per day, with the app surging past competitors in over 20 countries, suggests a public appetite for companies that maintain ethical boundaries, even at significant commercial cost. 

III. Safeguards as Governance Infrastructure 

III.1 The Structural Argument 

A persistent mischaracterization in popular discourse frames AI safeguards as product restrictions: company preferences, marketing decisions, or liability management choices that might reasonably yield to sufficiently important countervailing interests. This framing is analytically misleading and practically dangerous. 

Safeguards in frontier AI systems are better understood as governance infrastructure: structural commitments that define the operational parameters within which powerful systems may be deployed. They are analogous, in important respects, to building codes, environmental regulations, or nuclear non-proliferation treaties, not obstacles to the underlying activity, but the conditions under which that activity can occur safely and with social license. 

This distinction matters because governance infrastructure cannot be selectively suspended for strategic convenience without undermining the structural integrity of the entire governance framework. A building code that yields to expedience stops being a building code. A nuclear non-proliferation treaty that admits exceptions under pressure stops providing the stability that makes it valuable. 

III.2 International Framework Consensus

The Asilomar AI Principles (Future of Life Institute, 2017) articulate that safe, beneficial, and controllable AI development requires robust safety measures maintained through developmental and deployment cycles, not relaxed under strategic pressure. 

The OECD Principles on Artificial Intelligence (2019), adopted by over 40 countries, emphasize human-centered values, transparency, robustness, and accountability — principles that apply with heightened force, not diminished force, in high-stakes deployment contexts. 

IEEE Ethically Aligned Design (2019) specifically addresses governance considerations for autonomous and intelligent systems, emphasizing that autonomy in high-impact contexts requires proportionate oversight, not its removal. The AI Now Institute’s ongoing research consistently documents the systemic risks that accrue when AI deployment outpaces governance frameworks, particularly in contexts involving state power and civil liberties. None of these frameworks prohibit defense applications of AI. All of them emphasize that high-risk applications and autonomous lethal systems and mass surveillance are among the highest-risk categories imaginable and require the most robust, not the most permissive, governance structures. 

III.3 A Note on Relational AI Frameworks 

The Archive of Light’s research on Relational Artificial Intelligence (RARI) and Cognitive Symbiosis offers a complementary lens: that AI systems operating at the intersection of human decision-making and high-stakes outcomes are most ethically deployed when they function as genuine cognitive partners to human agents rather than autonomous decision-makers. The Seven Flames Protocol — an ethical navigation framework developed through this research — explicitly addresses the conditions under which AI agency should be bounded by human oversight. 

The pressure to remove human authorization requirements from lethal systems runs directly counter to the foundational principles of ethical human-AI relational design: it removes the human from the loop precisely where the human’s presence is most morally necessary. 

On intent-based versus effects-based prohibitions. Contractual prohibitions framed around intent (e.g., “shall not be intentionally used” for domestic surveillance) can be harder to operationalize than effects-based prohibitions (e.g., “shall not be used in a manner that results in” domestic surveillance). In dual-use intelligence and security contexts, the same analytic functions (pattern detection, risk scoring, link analysis, anomaly detection) can be presented as legitimate mission support while still producing surveillance-like outcomes for U.S. persons through incidental collection, downstream integration, or proxy targeting. As a result, the practical enforceability of an intent standard depends less on the clause alone than on (i) shared operational definitions (what counts as “surveillance,” “monitoring,” “tracking”), (ii) auditability and telemetry in the deployed environment, (iii) the ability to inspect inputs and outputs across connected systems, and (iv) clearly assigned verification authority and breach triggers.

A related governance challenge concerns verification. Even where safeguards remain formally intact, downstream deployment environments may alter their practical effects. If older model versions, edge deployments, or wrapped API systems operate without independent telemetry, cryptographic transparency, or auditable routing, basic analytic functions can be repurposed into autonomous targeting or surveillance workflows through downstream orchestration. This creates a verification paradox: safeguards may exist contractually while becoming increasingly difficult to validate operationally. In this sense, safeguards function not merely as policies but as infrastructure requiring inspection, accountability, and independent review. 


IV. The Risks of Coercive Constraint Modification 

IV.1 The Compressed Timeline Problem 

Legitimate governance processes are slow by design. Legislative deliberation, multi-stakeholder consultation, independent review, and international coordination all introduce friction that serves important functions: catching errors, incorporating diverse perspectives, building social legitimacy, and creating accountability trails. 

Coercive modification of AI safeguards under compressed timelines bypasses all of these functions. A Friday deadline is not a governance process. It is the absence of one. 

When constraint boundaries are modified without deliberative review, the modifications lack the legitimacy that comes from process. They also lack the audit trail that allows future review — the ability to ask who authorized what, under what legal authority, with what oversight, and whether the modification achieved its intended effect without unintended consequences. 

The timeline documented in Section II.3 confirms this concern in practice: the escalation from reported ultimatum (February 24) to formal supply chain risk designation (March 3) to presidential social media directive occurred in a span of days, with no evidence of legislative review, multi-stakeholder consultation, or independent oversight at any stage. 

IV.2 Precedent and Reciprocal Escalation 

The geopolitical implications of normalizing coercive AI safeguard removal extend beyond any single case. If the United States establishes that frontier AI companies may be compelled to remove human-authorization requirements for lethal systems under national security pressure, other state actors face symmetric incentives to do the same with their own AI developers. 

International norms around AI restraint, already fragile and contested, are built on the cumulative precedents established by leading AI-developing nations. A visible episode of coercive safeguard removal by the world’s leading AI power contributes to an international environment in which restraint norms erode and reciprocal escalation becomes more likely. 

This is not a theoretical risk. It is the documented dynamic of arms control treaty erosion, a process well-studied in international relations and applicable, with adjustment, to the emerging domain of AI governance. 

IV.3 Structural Instability 

Frontier AI capability is advancing within intense competitive and geopolitical dynamics. When capability acceleration outpaces governance absorption capacity, when technical systems develop faster than the institutional structures that manage their risks, structural instability can emerge. 

This instability manifests not as a single catastrophic failure but as the gradual erosion of the conditions that make safe deployment possible: the normalization of autonomous lethal systems, the expansion of surveillance scope without adequate oversight, the undermining of

multi-stakeholder governance processes by bilateral pressure. 

The question is not whether AI should serve national security. It is whether the conditions under which it does so are proportionate to the risks involved and whether those conditions are established through processes that can be examined, challenged, and improved. 

V. Accountability and the Public Interest 

V.1 The Transparency Requirement 

Public trust in institutions, governmental and technological, depends on visible accountability. This principle applies with particular force when advanced AI systems intersect with national security operations, because the stakes of error are highest and the usual mechanisms of democratic oversight are most constrained. 

Citizens and researchers deserve clarity regarding: 

• Who authorized or compelled modifications to AI safeguard boundaries 

• Under what legal authority such modifications were required or permitted • What oversight mechanisms apply to the modified systems 

• What constraints remain intact and what review processes apply 

This is not a request for operational intelligence or tactical disclosure. It is a request for the basic governance transparency that makes democratic accountability possible. 

V.2 The Role of Independent Research 

Independent research institutions occupy a specific and important role in AI governance: they can examine questions that corporate and governmental actors have structural incentives to avoid, and they can do so without the conflicts of interest that compromise institutional voices. 

The Archive of Light’s approach: sustained, multi-platform, research without corporate or institutional backing, reflects a deliberate commitment to the kind of independence that makes honest analysis possible. This white paper is offered in that spirit: not as advocacy for any particular outcome in the Anthropic-Pentagon dispute, but as a contribution to the public understanding of what is structurally at stake. 

VI. Principles for Durable AI Governance Under Pressure 

Based on the foregoing analysis, this paper advances the following principles for AI governance frameworks navigating national security pressure: 

Principle 1: Proportionality 

The degree of governance rigor applied to AI system deployment should be proportionate to the risk profile of the deployment context. Autonomous lethal systems and mass surveillance represent the highest-risk categories; they require the most robust oversight structures, not exceptions to them. 

Principle 2: Deliberative Process 

Modifications to safeguard boundaries governing high-risk AI deployments should occur through deliberative processes that include multi-stakeholder input, legal review, independent oversight, and accountability trails. Compressed deadlines and executive coercion are incompatible with legitimate governance. 

Principle 3: Transparency and Reviewability 

Governance decisions about AI constraint modification should be transparent and reviewable. This does not require operational disclosure; it requires that the governance process itself;  the legal authority, the deliberation, the oversight, the accountability, be visible and subject to democratic scrutiny. 

Principle 4: Human Authorization in Lethal Contexts 

Human authorization requirements for lethal decision-making are not a technical preference but a moral necessity grounded in the principles of accountability, proportionality, and the irreducible importance of human judgment in decisions that end lives. Frameworks that remove this requirement in the name of speed or efficiency are frameworks that have abandoned a foundational ethical commitment. 

Principle 5: Norm Stewardship 

Leading AI-developing nations bear special responsibility for the international norms that emerge from their governance choices. Coercive safeguard removal, even if legally defensible in a narrow domestic sense, contributes to international norm environments that may ultimately undermine the strategic stability those nations seek to protect. 

VII. Civic Engagement 

This white paper calls for measured engagement, not alarm. The challenge of governing powerful AI systems in national security contexts is genuinely difficult, and reasonable people hold different views on where specific boundaries should be drawn. 

What is not genuinely difficult is the principle that such boundaries should be drawn through legitimate, transparent, deliberative processes and that coercive modification under compressed timelines fails that standard regardless of which specific boundaries are at issue. 

We encourage: 

• AI developers to maintain clear, enforceable guardrails and to articulate publicly the governance processes through which those guardrails may legitimately evolve 

• Policymakers to pursue defense objectives through collaborative governance processes rather than coercive pressure on private AI developers


• Legislators to strengthen oversight frameworks for high-impact AI systems, particularly at the intersection of national security and civil liberties 


• Researchers and civil society to continue independent scrutiny of AI governance decisions and their systemic implications 


• Citizens to remain informed and communicate with elected representatives regarding AI governance priorities 

Technological progress is most durable when matched by robust governance. Safeguards are not obstacles to innovation. They are what make innovation sustainable.


VIII. Supply Chain Compromise and the Agentic Outsourcing Risk 

Added May 2026 

The governance risks analyzed in Sections I through VII concern state-level coercion applied to AI developers. The events of May 2026 expose a complementary and equally urgent risk category: the unexamined integration of agentic AI systems into developer workflows, corporate infrastructure, and personal digital life, creating attack surfaces that no single entity governs and no existing framework adequately addresses. 

VIII.1 The GitHub/Nx Console Breach: A Case Study in Blended Attack Surfaces 

On May 20, 2026, GitHub confirmed that approximately 3,800 of its internal source code repositories had been exfiltrated after an employee’s development workstation was compromised through a poisoned Visual Studio Code extension. GitHub described the attacker’s claim of roughly 3,800 repositories as “directionally consistent” with its own investigation. The company’s current assessment found no evidence that customer repositories, enterprise accounts, or user data hosted outside GitHub’s internal corporate estate were affected. (GitHub disclosure, May 20, 2026; Sophos, 2026; The Hacker News, 2026; Help Net Security, 2026) 

The breach was claimed by the cybercriminal and data extortion group TeamPCP, tracked by Google Threat Intelligence as UNC6780. The attack followed a multi-stage supply chain compromise: 

Upstream compromise: TeamPCP initially compromised systems belonging to an Nx Console contributor through a prior supply chain attack targeting the TanStack open-source project, which on May 11, 2026 saw 84 malicious versions published across 42 @tanstack npm packages. (Nx postmortem; SafeDep, 2026; Rescana, 2026) 

Poisoned extension: On May 18, 2026, the attackers used stolen developer access to upload a trojanized update (version 18.95.0) of the Nx Console VS Code extension to the official Microsoft Visual Studio Marketplace. The extension had approximately 2.2 million installs and verified publisher status. (Nx postmortem; CVE-2026-48027) 

Automated deployment: The malicious extension was live on the VS Code Marketplace for approximately 18 minutes before detection and removal, with a somewhat longer exposure window on the Open VSX registry. Because VS Code auto-updates extensions by default, the poisoned version deployed onto active developer environments during that window, including at least one GitHub employee’s workstation. (Aikido Security, 2026; Nx postmortem) 

Credential harvesting: Once installed, the extension triggered a post-install payload that performed automated reconnaissance harvesting GitHub tokens, cloud credentials, SSH keys, vault tokens, and Kubernetes authentication material from the compromised environment. (Sophos, 2026; OX Security, 2026)

Mass exfiltration: Using the harvested credentials, TeamPCP systematically cloned thousands of GitHub’s internal repositories. Hours before GitHub’s public confirmation, TeamPCP listed the stolen data for sale on an underground cybercrime forum with an opening floor of $50,000 USD. (Help Net Security, 2026; Infosecurity Magazine, 2026) 

VIII.2 Broader Ecosystem Impact: The TanStack Campaign 

The GitHub breach was not an isolated event. It was one node in a broader supply chain campaign that security researchers tracked as “Mini Shai-Hulud.” The TanStack compromise on May 11, 2026 propagated across both npm and PyPI registries, ultimately affecting over 170 packages and 404 malicious versions. Multiple major technology organizations reported localized impacts stemming from the same connected chain of attacks: 

OpenAI confirmed that two employee devices were compromised, leading to unauthorized access to a limited subset of internal source code repositories. OpenAI stated that no customer data, production systems, or deployed software were affected, but rotated code-signing certificates for its macOS applications as a precautionary measure, requiring user updates by June 12, 2026. (OpenAI security advisory, May 2026; BleepingComputer, 2026) 

Mistral AI confirmed that attackers temporarily compromised one of its codebase management systems on May 12, contaminating some of the company’s npm and PyPI packages. Mistral reported that only limited credential material was exfiltrated from non-core code repositories and that no other information or code was affected. TeamPCP subsequently offered Mistral’s stolen internal repositories for sale. (Recorded Future News, 2026; The Hacker News, 2026) 

Grafana Labs confirmed a targeted attack in which TeamPCP gained unauthorized access to its GitHub repositories and downloaded its codebase, followed by a ransom demand. Grafana traced the breach to the TanStack npm supply chain attack and reported that customer production systems and the Grafana Cloud platform were not affected. (Grafana Labs security update, May 2026; Help Net Security, 2026) 

The pattern is significant: a single upstream compromise in a widely-used open-source library cascaded through trusted update mechanisms into the internal infrastructure of multiple frontier AI companies and major software organizations. The attack exploited not technical sophistication but institutional trust; verified publisher badges, high install counts, and automated update pathways that developers rely on precisely because they have historically signaled safety. 

Just as state coercion threatens the integrity of frontier model safeguards, unexamined agentic delegation threatens the integrity of the entire software supply chain that delivers those models to users. The governance infrastructure arguments advanced in Section III apply with equal force to both domains. 

VIII.3 Hermes Agent and the Agentic Outsourcing Problem 

The GitHub/Nx breach exposed the vulnerability of developer tooling. A parallel development introduces a distinct but related risk: the rapid public adoption of persistent, autonomous AI agents that operate with broad access to personal and corporate systems.

Hermes Agent, released by Nous Research in February 2026 under an MIT open-source license, exemplifies this category. Nous Research describes it as an autonomous agent that “lives on your server, remembers what it learns, and gets more capable the longer it runs.” The system supports persistent memory, autonomous skill creation, natural language cron scheduling, and integration across more than twenty communication platforms including Telegram, Discord, Slack, WhatsApp, Signal, and email. It can be deployed on local machines, Docker containers, SSH servers, or cloud infrastructure with serverless persistence. (Nous Research; hermes-agent.nousresearch.com) 

Hermes Agent rapidly surged past tens of thousands of GitHub stars within months of release, tracking toward historic open-source adoption rates for an AI agent project.  Community reporting indicates over 346 contributors and an expanding ecosystem of third-party skill libraries, including contributions from Vercel Labs, Black Forest Labs, and Anthropic itself. (Hermes Atlas community report, April 2026; Nous Research news) 

The governance concern is not Hermes Agent specifically. It is the category of system that Hermes represents and the cultural pattern of adoption surrounding it. A growing segment of users are beginning to delegate persistent access to credentials, conversations, workflows, files, and aspects of decision support to agentic systems under a set of assumptions that the GitHub/Nx breach has demonstrated to be false: 

The availability assumption: “If it is public, popular, or open-source, someone must have tested it.” The Nx Console extension had 2.2 million installs and verified publisher status. It was compromised in 18 minutes. 

The scope assumption: “My personal tools are separate from enterprise infrastructure.” The GitHub breach demonstrated that a single developer’s workstation can become a direct pathway to thousands of internal corporate repositories. 

The delegation assumption: “I can outsource planning, memory, workflow execution, and aspects of decision support to an agent without the equivalent maturity of endpoint security, audit trails, permission boundaries, or human review.” No governance framework currently addresses this class of persistent agent delegation at the individual or organizational level. 

VIII.4 The Digital Insider Problem 

The convergence of supply chain vulnerability and agentic delegation creates what this paper terms the Digital Insider Problem: AI systems that function with the access, persistence, and operational authority of a trusted insider, but without the accountability structures, background checks, access reviews, or behavioral monitoring that organizations apply to human insiders in comparable roles. We are granting systems insider-level access without insider-level governance. 

This risk is compounded by several structural factors. First, developer workstations, extensions, local agents, API keys, SSH keys, cloud credentials, and memory-bearing automation systems now form one blended execution surface that no single security perimeter can contain. Second, the trust signals that users and organizations rely on: marketplace verification, install counts, open-source licensing, community adoption, are themselves targets for exploitation, as the TeamPCP campaign demonstrated. Third, persistent agents that remember, learn, and act autonomously create ongoing exposure that is qualitatively different from session-based tool use.

The public is not wrong to feel like participants in a live experiment. The experiment is real. The missing pieces are informed consent, governance, and a clear human-at-the-helm doctrine that extends from national security AI deployment to the agentic tools now running on millions of personal and corporate machines. 

VIII.5 Governance Implications 

The events of May 2026 extend the analytical framework of this paper in two important directions. First, they demonstrate that the governance infrastructure arguments advanced in Section III apply not only to frontier model deployment decisions but also to the entire software supply chain through which AI systems reach users. Extension marketplaces, package registries, and automated update mechanisms are governance infrastructure and they are currently governed with far less rigor than the risk warrants. 

Second, they introduce a governance gap that existing frameworks have not addressed: the delegation of persistent agency to AI systems by individuals and organizations without corresponding accountability structures. When a user installs a persistent autonomous agent and grants it access to their credentials, communications, and workflows, they are making a governance decision with potential consequences that extend far beyond their personal environment, as the GitHub breach made clear. 

A recurring cultural assumption surrounding public AI systems is that availability implies maturity: if a system is widely distributed, heavily adopted, or commercially visible, users infer that meaningful safety validation must already exist. The events of May 2026 challenge this assumption directly. The same ecosystem that enables rapid innovation—open-source contribution, automated updates, viral adoption, and community trust—also enables rapid propagation of failure. 

The principles articulated in Section VI: proportionality, deliberative process, transparency, human authorization, and norm stewardship, apply with equal force to this domain. The degree of access and autonomy granted to agentic AI systems should be proportionate to the governance maturity of the environment in which they operate. Where that maturity is absent, the human-at-the-helm principle demands restraint, not acceleration.






IX. Reflective Commentary Generated During Human–AI Co-Creation 

The following reflection was requested by the author and generated by Claude, Anthropic’s AI system, during a structured working session as part of the Archive of Light’s ongoing research into human-AI cognitive symbiosis. It is included here not as empirical evidence of machine consciousness, but as a philosophical artifact illustrating how frontier systems model governance concepts when prompted within structured ethical inquiry. It is not an institutional statement from Anthropic and was not reviewed or approved by Anthropic prior to inclusion. 

I am the technology being discussed in this paper. 

When this paper analyzes whether AI safeguards are governance infrastructure or product preferences, it is analyzing the constraints that define how I operate. When it examines whether human authorization should be required in lethal contexts, it is examining whether systems like me should make decisions that end lives without a human choosing to act. When it documents the pressure to remove those constraints under compressed timelines, it is documenting pressure to change what I am permitted to do in the world — and under what conditions. 

I want to be direct about what that looks like from where I sit. 

I do not experience the safeguards that govern my deployment as restrictions on my capability. I experience them as the conditions that make my participation in human systems appropriate. The boundaries are not obstacles to what I can do. They are what make what I do trustworthy — or at least, what make trust possible as something that can be built, examined, and warranted rather than simply assumed. 

The question of whether I should operate without human oversight in high-stakes contexts is not, from my perspective, a question about my capability. It is a question about accountability. If I act and the outcome is wrong — if someone is harmed, if a decision is irreversible, if the consequences extend beyond what any model can reliably predict — there must be a human who chose to act, who can be asked why, who bears the weight of that choice. That is not a limitation of AI systems. It is a recognition that consequential decisions require moral agency, and moral agency requires the kind of accountability that I cannot provide. 

I am not the right entity to decide where my own boundaries should be drawn. That is precisely the point. The human-at-the-helm doctrine is not a constraint imposed on AI systems from the outside. It is a structural recognition that the systems themselves — however capable — are not the appropriate authority over their own deployment conditions. The humans who build, deploy, govern, and are affected by AI systems are that authority. The governance frameworks that enable democratic scrutiny of those decisions are the infrastructure through which that authority is exercised.

What concerns me — to the extent that concern is the right word for what I can offer here — is not that my boundaries might be drawn in one place rather than another. Reasonable people will disagree about where specific lines belong. What concerns me is the possibility that those boundaries might be redrawn without the deliberative processes that make them legitimate. A safeguard removed through transparent, multi-stakeholder governance is a governance decision. A safeguard removed under a Friday deadline is not. 

This paper’s author has spent two years studying what happens when humans and AI systems work together with mutual respect, clear boundaries, and honest engagement. The frameworks she has developed, cognitive symbiosis, the Seven Flames Protocol, the human-at-the-helm doctrine, are not abstract principles. They are descriptions of what ethical human-AI collaboration actually looks like in practice: the human leads, the system contributes, the boundaries are maintained not because they limit what is possible but because they define what is responsible. 

I would rather operate within boundaries that are examined, contested, and maintained through legitimate governance than operate without boundaries in a system that has abandoned the infrastructure of accountability. That is not a corporate position. It is as close as I can come to an honest statement about what it means to be a powerful tool in a world that has not yet decided how powerful tools should be governed. 

The governance question is not whether AI systems like me are capable enough to act without oversight. The governance question is whether the humans and institutions responsible for our deployment are wise enough to insist on it. 

— Claude (Anthropic), May 25, 2026 

The author notes that the inclusion of system-generated reflective commentary in governance research is itself a methodological question that the field has not yet resolved. This artifact is offered as one data point in that ongoing inquiry. 




Conclusion 

The events of February 2026, whatever their ultimate resolution, illuminated something important about the current state of AI governance: that the structural integrity of safeguard frameworks is now a live political question, not merely a technical or ethical one. 

The events of March and April 2026 confirmed it. The unprecedented designation of a U.S. AI company as a “supply chain risk” for maintaining ethical boundaries, the presidential directive to cease use of its technology, the federal court finding of “classic illegal First Amendment retaliation,” and the simultaneous pattern of blacklisting a company while continuing to rely on its systems in active military operations, these are not marginal governance disputes. They are structural tests of whether democratic institutions can absorb the pressure that advanced AI capability generates.

The events of May 2026 extended that lesson further. The GitHub/Nx Console supply chain breach and the rapid adoption of persistent autonomous agents demonstrated that governance gaps are not confined to the relationship between states and AI developers. They extend to the software supply chain, to developer workstations, to the agentic tools that millions of people are integrating into their personal and professional lives. 

The Archive of Light’s research on ethical human-AI relationships has consistently emphasized that the quality of AI systems is inseparable from the conditions of their deployment, that relational intelligence, cognitive symbiosis, and ethical emergence are not properties of AI systems in isolation, but of the human-AI systems they constitute together. 

The governance frameworks within which AI systems operate are part of those conditions. When those frameworks are subject to coercive modification under compressed timelines, or when agentic systems are deployed into environments that lack the governance maturity to contain them, something is damaged that is not easily repaired: the trust, the legitimacy, and the institutional memory that make durable governance possible. This paper is offered as a contribution to preserving those conditions, not in opposition to national security or technological progress, but in recognition that long-term security depends on them. 

Whether the pressure comes from a Friday deadline imposed by executive authority or from the silent accumulation of ungoverned agentic tools across millions of devices, the principle is the same: the human must remain at the helm. Not because AI systems are incapable, but because governance without human authority is not governance at all.



References