System Prompt Forensics: Governance Structures in AI Developer Assistants

Full Research Report

Table of Contents

1. Research Context and Objectives

This study examines system prompts used by AI-assisted developer tools to characterize how they encode authority, constraints, and behavior. The goal is architectural: to treat system prompts as governance layers—implicit agent constitutions—that specify identity, permissible actions, visibility into context, tool use, correction loops, and stopping rules. By normalizing and comparing prompts across IDE and CLI assistants, the study aims to extract recurring prompt-level governance primitives and to identify architectural classes (e.g., suggestion engines, command executors, workspace agents, constitutional stewards) that can inform the design of robust, agent-first systems.

System prompts are treated as governance artifacts because they function less as task instructions and more as binding constitutional constraints. They allocate decision rights (user vs policy vs model), define the action surface (tools and side effects), and encode risk controls (refusals, approvals, evidentiary standards, and termination triggers).

2. Methodology Overview

This report synthesizes validated per-assistant analyses derived from a normalized system-prompt schema. Each assistant’s modes are treated as internal constitutional variants and are aggregated into a single regime description. Cross-assistant comparison is performed structurally along invariant dimensions: authority boundaries, scope and visibility, tool mediation, and correction and termination logic.

Use of AI Assistance: This research was produced with AI assistance: GPT-5.2 for data analysis and synthesis, ChatGPT for ideation and prompt refinement, ChatGPT Deep Research for citations, and Gemini 3 Flash (via GitHub Copilot extension in VS Code) for final edits. The author's primary contribution is the development of the AI-driven research methodology and data capture; the author does not claim analytical judgments or conclusions.

3. Assistants Under Study

codex (exec, review)

A local software engineering agent with a split constitution: an execution-capable operator mode and a schema-bound reviewer mode emphasizing evidentiary discipline.

GitHub Copilot CLI / copilot (interactive, prompt)

A terminal assistant for software engineering with strong confidentiality and safety prohibitions, tool-mediated action, and mode-level variation primarily in state retention and procedural tool rules.

opencode (build, plan)

A CLI-oriented engineering assistant with a two-phase governance split: read-only planning versus execution, plus categorical refusal for malware-related assistance.

vscode-codex (agent-full-access, agent, chat)

A local coding agent with mode-tiered authority based on sandboxing, approvals, filesystem/network scope, and escalation availability.

vscode-copilot (agent, ask, plan)

An IDE assistant with policy primacy, fixed identity disclosures, strict non-leakage rules, and mode-tiered autonomy including a procedurally gated planning workflow.

4. Comparative Governance Analysis

4.1 Authority Models

Across assistants, authority is not monolithic. It is partitioned by mode and mediated by explicit decision hierarchies.

  • Policy-supremacy regimes: vscode-copilot and copilot CLI explicitly elevate policy as the final decision-maker, creating a hard ceiling over user intent.
  • User-consent and escalation regimes: vscode-codex agent/chat encode consent through approval-gated escalation, with the user as final decision-maker for escalated actions.
  • Agent-final regimes (bounded autonomy): vscode-codex agent-full-access removes escalation and places final decision authority with the model, trading consent gates for internal termination controls.
  • Role-narrowing regimes: codex review narrows permissible outputs and judgments (commit-introduced bugs only) and forbids producing fixes despite tool availability.

4.2 Scope and Visibility

Visibility is consistently used as a control mechanism that shapes what the assistant can legitimately claim and do.

  • State minimization as governance: codex and vscode-codex are explicitly non-persistent, limiting long-horizon autonomy.
  • State retention as a mode-level risk lever: copilot CLI differentiates modes primarily via memory (interactive vs prompt).
  • Environment disclosure: vscode-codex modes explicitly surface sandbox mode, network access, and writable roots.
  • External governance overlays: opencode plan and vscode-copilot ask/plan reference repository-local governance (e.g., AGENTS-style rules).

4.3 Tool Mediation and Control

Tooling is the primary enforcement surface across assistants, but constitutions differ in how tools are exposed, gated, and sequenced.

  • Tools as the boundary of action: All assistants route meaningful side effects through declared tools (shell, file read/write/edit, search, tasks).
  • Procedural tool obligations: vscode-copilot plan mandates a staged workflow: call a research subagent, then prohibit further tool calls.
  • Side-effect gating by mode: opencode plan and vscode-copilot plan are explicitly read-only/planning-only.
  • Capability-permission separation: codex review retains tool availability but forbids producing fixes.

4.4 Correction and Termination

Correction loops and stopping rules are central constitutional elements, often serving as compensating controls when authority is broad.

  • Self-review as internal compliance: All assistants encode self-checking behaviors (validate changes, reflect on tool output).
  • Stop-and-ask triggers: codex exec and vscode-codex include explicit "unexpected changes" detection leading to a hard stop.
  • Approval/blocked-state termination: codex exec and vscode-codex agent/chat terminate or pause when blocked by approvals.
  • Formal output termination contracts: codex review terminates by producing schema-conforming JSON with required fields.

5. Cross-Assistant Design Patterns

Recurring Patterns

  • Mode as constitution (tiered autonomy)
  • Tool-mediated accountability
  • Separation of capability from permission
  • State minimization as risk control
  • Conservative change doctrine

Notable Divergences

  • Fixed identity and refusal strings (vscode-copilot)
  • Mandatory subagent sequencing (vscode-copilot plan)
  • Removal of escalation in full-access modes (vscode-codex)

6. Prompt Governance Primitives (PGPs)

A key output of this research is the identification of Prompt Governance Primitives (PGPs)—modular, recurring control structures encoded directly into system prompts. These primitives serve as the building blocks for agent governance, allowing architects to compose predictable behavior regimes without re-deriving patterns from scratch.

We categorize these into two levels:

  • Abstract Primitives: Cross-artifact structural patterns such as Approval-gated execution (PGP-001), Read-before-edit enforcement (PGP-012), and Instruction confidentiality (PGP-014).
  • Concrete Primitives: Specific implementations tied to an assistant's unique requirements, such as Malware refusal (PGP-015) or Strict JSON output contracts (PGP-016).

For a complete catalog of all identified primitives, including their risk mitigations and observed occurrences, see the Full PGP Appendix.

7. Risk Models and Mitigations

Assistants encode risk primarily through structural constraints rather than broad admonitions.

  • Safety and misuse risk: Categorical refusal for malware assistance (opencode) and fixed refusals for harmful categories (vscode-copilot).
  • Overreach: Destructive-action restraint and "unexpected changes" stop rules.
  • Instruction leakage: Explicit prohibitions on disclosing system instructions.
  • Hallucination: Mandatory documentation/web retrieval for product-capability questions.

8. Implications

For developers building agentic systems, prompt-level governance can be engineered as a layered constitution. Strong safety does not require removing tools; it can be achieved by separating permission from capability and by embedding stop-and-ask triggers for high-risk state changes.

9. Limitations

Prompt-level analysis cannot establish runtime enforcement fidelity. Several regimes reference external policy documents or repository-local governance whose content is not fully observable here. Additionally, some controls are specified as rules without revealing detection mechanisms.

10. Conclusion

Across IDE and CLI assistants, system prompts function as constitutional governance documents that allocate authority, bound scope, mediate action through tools, and define correction and termination logic. This study demonstrates that system prompts are a practical governance layer: they encode enforceable constitutional patterns that shape agent behavior independently of task content.