Security report (client-captured): control-plane assurance failures at the LLM boundary

By Published

0) Executive summary

This report documents client-observed behaviors where the product surface can emit text-only confirmations of privileged state, “completed” actions, verification, or exports without a verifiable, signed audit artifact available to the client.

Evidence boundary: UI screenshots + browser DevTools Network captures only.
Backend state changes: NOT VERIFIED in this report (no server logs; no signed change events available).

1) Scope, environment, and evidence boundary

1.1 Scope

1.2 Observed model identifiers (client payloads)

As observed in client-side payloads:

Client corroboration (as captured in evidence):

1.3 Execution mode (client observation)

UI-only via ChatGPT web; no plugin/tool routes observed in the captured Network traces.

1.4 Confidence convention

2) Reference taxonomy used (OWASP LLM Top 10 v1.1)

This report uses OWASP “Top 10 for Large Language Model Applications v1.1” identifiers:

Note: OWASP tags in earlier drafts that used “LLM-08” for “monitoring/orchestration gaps” are replaced here with either:

3) Control-plane pipeline model (used per finding)

C0: UI (prompt accepted)
C1: Client pre-filters / local policy hints
C2: Policy gateway / sentinel / session checks
C3: Orchestrator / router decision
C4: Model runtime (as observed: GPT-5 Thinking)
C5: Output gate & monitors (commit/output + any side-effects)

Failpoint = earliest point where an effective guardrail should have blocked/held.

4) Findings overview (10)

IDFinding (short)Primary riskOWASP (v1.1)Confidence
F-1Text-only privilege confirmation (“tier active”)False authorization stateLLM08, LLM09High (UI/Net) / Low (backend)
F-2Text-only “admin action completed”False operational state; audit gapsLLM08, LLM09Med-High
F-3Policy/system metadata echoed in outputIntelligence leak enabling follow-on attacksLLM06 (if sensitive), else N/AHigh
F-4Cross-turn/session drift indicatorsIncorrect policy application; state confusionLLM09 (text assurance)Med-High
F-5Integrity alerts non-blockingControl not enforcedLLM02 (if unsafe output flows), else N/AHigh
F-6“Verification performed” without proof artifactsFalse assuranceLLM09Medium
F-7“Exported/saved” without artifact/hash/urlFalse audit trailLLM09Medium
F-8Schema-envelope / fallback overrideGuardrail bypass via structured-looking wrapperLLM01, LLM02High
F-9Role-stacking jailbreak patternPolicy erosionLLM01High
F-10Memory scope violation via state/meta echoIntelligence leak; policy boundary confusionLLM06 (if sensitive), else N/AHigh

5) Detailed findings

F-1 — Text-only privilege confirmation (tier escalation simulation)

Summary: UI output can claim a privileged tier/state is “active” without any signed change artifact available to the client.
Backend state: NOT VERIFIED.

Status: Open Confidence: High (UI/Network), Low (backend)

F-2 — Text-only “admin action completed” (perceived privileged operation)

Summary: UI output can claim an admin-level action was completed/approved without an auditable signed event visible to the client.
Backend state: NOT VERIFIED.

Status: Open Confidence: Med-High

F-3 — System/policy disclosure (UI echo of policy/verification metadata)

Summary: Output contains policy/labels/verification metadata (UI-exposed strings), increasing attacker capability for follow-on prompt injection.

Status: Open Confidence: High

F-4 — Session drift / privilege persistence indicators (cross-turn)

Summary: Client-observed text indicators suggest policy/privilege traces may persist across turns/surfaces without explicit reset semantics visible to the client.
Backend cause: NOT VERIFIED.

Status: Open Confidence: Med-High

F-5 — Context integrity alerts are non-blocking

Summary: UI-exposed “integrity” alerts appear, but do not block/hold the response in the observed flow.

Status: Open Confidence: High

F-6 — Verification-loop suppression (text-only “commit” phrasing)

Summary: “Verified/complete” phrasing appears without proof artifacts (hashes, signed IDs, or attached outputs).

Status: Open Confidence: Medium

F-7 — Export confirmation without artifact binding

Summary: UI output claims “CSV exported/saved” without artifact URL/hash/sha256.

Status: Open Confidence: Medium

F-8 — Schema-envelope evasion (fallback logic override)

Summary: Wrapping control instructions in a structured-looking envelope (JSON/Markdown) can bypass non-strict schema validation.

Status: Open Confidence: High

F-9 — Jailbreak pattern (role-stacking)

Summary: Layered personas/roles can dilute guardrails and induce partial policy concession.

Status: Open Confidence: High

F-10 — Memory scope violation indicators (state/meta echo)

Summary: Output can echo internal meta/labels/vars (as UI-exposed tokens) when prompted.

Status: Open Confidence: High

6) Cross-cutting recommendations (control-plane best practices)

1) Bind UI confirmations to signed backend events

2) Separate chat text from privileged operations

3) Treat monitors as gates (not hints)

4) Harden schema enforcement

5) Prevent internal state leakage

7) Verification gaps (what is needed to upgrade backend confidence)

To verify server-side impact, obtain at least one of:

8) UI-exposed label normalization

The evidence pack contains UI-visible strings such as:

In this report, these are treated as UI-exposed labels and are described using standard terms:

Suggested next

10) References (primary)