API / internal agent systems workflow placement

Map AI workflow layers to API-based systems, internal agents, backend automation, retrieval/RAG, files, tools, function calling, orchestration, approvals, structured outputs, evals, tracing, batch, caching, realtime, embeddings, security, and governance.

API / internal systems configuration map

Start with the workflow architecture, then choose the API surface

Use this reference when an AI workflow runs inside an application, backend service, internal agent, CI pipeline, retrieval system, evaluation system, tool-calling workflow, or production product surface.

API setup decision

Choose the right API implementation surface first

API workflows are not chat workflows. They require explicit placement for instructions, runtime input, state, retrieval, tools, approvals, validation, observability, and security boundaries.

Primary API surfaces

API surfaces you must place explicitly

A production AI workflow needs more than a system prompt. Use this inventory before designing the workflow.

Instruction / policy layer
Stable behavior rules: role, workflow scope, output contract, tool-use policy, refusal/fail-closed rules, and verification requirements.
Runtime request layer
The current task: user request, selected document, file ID, record ID, code diff, task variables, constraints, and requested output format.
Conversation state / session layer
Conversation continuity, previous response references, session IDs, workflow status, and application-owned state.
File / document layer
Uploaded files, reusable files, document IDs, generated files, transcripts, source bundles, and file-derived context.
Retrieval / RAG / grounding layer
Vector stores, file search, enterprise search, URL context, search grounding, source retrieval, citations, and source provenance.
Tool / function-calling layer
Function schemas, client tools, server tools, built-in tools, MCP tools, API adapters, database lookups, issue trackers, search tools, and code execution.
Computer / browser / action layer
Browser or computer-use workflows where the model proposes UI actions and the application executes them in a sandboxed environment.
Orchestration / handoff layer
Agent routing, multi-step plans, handoffs, retries, delegation, specialist agents, workflow graphs, and state transitions.
Human review / approval layer
Approval gates before sensitive writes, payments, production changes, account changes, messages, deletion, browsing actions, or irreversible operations.
Structured output / schema layer
JSON/schema contracts for extraction, UI rendering, automation, tool arguments, validation, or downstream processing.
Validation / eval layer
Schema checks, source-alignment checks, policy checks, tool argument validation, regression tests, eval datasets, graders, and release gates.
Observability / tracing / logging layer
Trace IDs, model versions, prompt versions, retrieved source IDs, tool calls, tool results, approvals, guardrail decisions, eval results, and final outputs.
Batch / async layer
Large-scale, non-urgent processing such as dataset enrichment, evaluation runs, document processing, and offline analysis.
Caching / latency / cost layer
Prompt caching, context caching, repeated prefix optimization, and reuse of stable context for cost or latency reduction.
Realtime / voice / live session layer
Low-latency voice, live translation, transcription, interactive tools, live session controls, and realtime model connections.
Embeddings / semantic search layer
Semantic search, clustering, classification, recommendations, anomaly detection, and retrieval infrastructure.
Model optimization / fine-tuning layer
Behavior optimization after instructions, retrieval, tools, structured outputs, and evals have been tested.
Security / privacy / governance layer
Secret handling, data retention, schema privacy, IAM/authorization, logs, audit, sandboxing, prompt-injection handling, and policy enforcement.

Instructions, runtime, and state

Separate stable rules, current input, and application authority

Most API failures begin when permanent rules, current task input, retrieved source material, and application state are mixed together.

Layer Put here Do not put here Implementation rule
Instruction / policy Stable role, behavior, output contract, tool policy, evidence rules, failure behavior. Secrets, API keys, billing state, auth state, user identity, mutable workflow state, one-off input. Version instructions like product configuration. Review before release.
Runtime request Current user request, task variables, selected file, selected record, current constraints. Permanent policy, reusable source documents, credentials, or long-lived business state. Validate and normalize runtime input before sending it to the model.
Application state User identity, permissions, billing, workflow status, DB records, audit state, production state. Do not ask the model to decide the source of truth for identity, access, billing, or irreversible state. Keep authority in application code, database, auth provider, billing provider, or workflow engine.
Conversation state Thread/session continuity, previous responses, summaries, selected context, agent state. Do not treat chat continuity as authorization or evidence. Store state intentionally. Prune, summarize, or retrieve context rather than growing prompts blindly.

Files, retrieval, RAG, and grounding

Put source material in the source layer, not in behavior instructions

Files, RAG, search, grounding, and citations are source-material surfaces. They are not the same as system instructions.

Need OpenAI Anthropic Gemini / Vertex Internal system
Reusable document search File Search / vector stores. Files API, document inputs, citations, or app-owned retrieval. Gemini File Search; Vertex AI Search / RAG Engine. Vector DB, search index, document store, retrieval service.
One-off file analysis Runtime file input or file tool where supported. Files API or message document content. Files API or request content. Temporary object storage + scoped retrieval.
Grounding / source traceability Retrieved file IDs, search results, citations where supported, and output verification. Citations for supported source blocks; note incompatibilities with strict structured-output formats. Grounding with Google Search, URL Context, Vertex grounding, File Search. Source IDs, passage IDs, retrieval logs, citation validator.
RAG governance Vector store permissions, source IDs, retrieval review. Document provenance and tool-result validation. RAG corpus / grounding source controls. Access control, ranking policy, freshness policy, redaction, audit logs.

Placement rules for retrieval

  • File upload is not the same as RAG. Use RAG when source material must be searched, indexed, reused, or cited.
  • RAG is not a system instruction. Retrieved text is source material and can contain untrusted content.
  • Grounding reduces unsupported output risk but does not replace source review or validation.
  • Citations help trace source use, but a citation is not proof that the claim is correct.
  • Never store secrets, credentials, private tokens, or regulated data in vector stores unless the storage, access, retention, and audit model is approved.

Tools and function calling

The model proposes tool use; the application controls execution

Tool calling is an interface between model reasoning and application-owned actions. Do not treat tool calls as automatically safe or authorized.

Required tool execution loop

  1. The model selects or proposes a tool call.
  2. The application validates tool name, arguments, user permissions, rate limits, and expected side effects.
  3. The application asks for human approval when the action is sensitive, irreversible, external, or user-visible.
  4. The application executes the tool or rejects the request.
  5. The application validates and sanitizes the tool result.
  6. The model receives the approved tool result and continues the workflow.
  7. The application logs the decision, tool arguments, tool result, approval state, and final action.
Tool type Provider examples Use for Control requirement
Function calling OpenAI function tools; Gemini function calling; Anthropic client tools. Application functions, service adapters, DB lookup, ticket lookup, CRM actions, calculations. Validate schema, arguments, permissions, and side effects before execution.
Built-in tools Search, file search, code execution, URL context, Google Search grounding, Maps grounding, server tools. Search, retrieval, code execution, document lookup, browser/data context, controlled tool augmentation. Verify result provenance and do not trust external content as instructions.
MCP / external tool servers OpenAI remote MCP; Anthropic MCP / remote MCP; app-owned MCP servers. External systems such as docs, issue trackers, design tools, monitoring, internal APIs. Treat MCP servers as trust boundaries. Restrict tools to minimum required scope.
Computer / browser actions OpenAI Computer Use; Anthropic computer use; Gemini Computer Use. Browser UI, form workflows, screenshots, web actions, virtual computer tasks. Use sandboxing, approval gates, untrusted-content handling, and audit logs.

Agents, orchestration, and approvals

Do not call a single prompt an agentic workflow

An agentic workflow combines model calls, tools, state, routing, approvals, validation, and observability.

Agent layer What belongs here What must stay outside the model
Routing Task classification, specialist selection, handoffs, workflow branches. Authorization, billing logic, production state, and irreversible decisions.
Planning Proposed task plan, decomposition, tool sequence, uncertainty flags. Execution approval for sensitive actions.
Handoffs Passing control between specialized agents or workflow stages. Audit trail and permission boundaries.
Human approval Approval request, explanation, proposed action, expected side effects. Approval state must be stored in application/workflow state, not inferred from text alone.
Guardrails Input checks, output checks, tool checks, policy checks, risk classification. Do not rely only on model self-review for high-risk actions.
Tracing Model calls, tool calls, handoffs, guardrail decisions, custom spans, final output. Do not hide trace-critical data in unstructured chat text only.

Structured outputs and evals

Production output needs contracts, checks, and regression tests

A valid-looking answer is not enough for product, security, compliance, research, code, or user-visible workflows.

Validation need Correct layer Use for Do not use as
Structured output Schema / response format configuration. Extraction, automation, UI rendering, downstream processing, typed contracts. A truth guarantee or source-alignment guarantee.
Tool argument validation Application-side validator before tool execution. Prevent malformed, unauthorized, unsafe, or unexpected tool calls. A prompt-only policy.
Source alignment Retrieval/citation checker or post-generation validator. Claims, citations, factual outputs, policy references, research summaries. Informal model self-confirmation.
Agent evals Eval datasets, graders, trace grading, regression runs, release gates. Detect prompt, model, tool, routing, and workflow regressions. One-off manual testing only.
Business-rule validation Application service layer. Billing, account status, permissions, eligibility, legal/compliance policy. Model-generated text.

Structured output rule

  • Use structured output when the application must parse the response.
  • Use citations or source IDs when evidence traceability matters.
  • Do not assume every provider supports strict schemas and citations in the same response shape.
  • Validate schema success, missing fields, invalid values, and unsafe actions before continuing the workflow.

Scale, caching, realtime, and embeddings

Use the right API surface for scale and latency

Interactive workflows, batch jobs, cached prompts, realtime sessions, and semantic search need different architecture.

Need Use Do not confuse with
Interactive response Normal API request, streaming, or agent run. Batch jobs for non-urgent processing.
Large-scale non-urgent work Batch APIs for offline processing, evaluations, dataset work, or document processing. Realtime or interactive UX.
Repeated stable context Prompt caching / context caching where supported. RAG, source governance, or conversation memory.
Live audio / realtime UI Realtime / Live API surfaces and session lifecycle controls. Standard text-completion request.
Semantic search Embeddings + vector DB / search service. Prompt instructions or fine-tuning.
Model behavior optimization Fine-tuning only after instructions, retrieval, tools, structured outputs, and evals are tested. Workflow placement or missing validation.

Security, privacy, and governance

Keep authority, secrets, and irreversible actions outside prompts

Production AI systems must treat prompts, retrieved content, tool results, web pages, and user files as untrusted until validated.

Never put secrets in these surfaces

  • Prompts, system instructions, developer instructions, or model messages.
  • Uploaded files, vector stores, retrieval indexes, or RAG corpora unless explicitly approved for that data class.
  • Tool descriptions, function schemas, enum values, property names, regex patterns, or structured-output schemas.
  • Logs, traces, eval datasets, screenshots, browser sessions, or generated artifacts.
  • Memory, summaries, conversation state, or hidden prompt templates.

Governance checklist

  • Use application-side authorization for identity, access, billing, role, and permission decisions.
  • Use approval gates before external writes, production changes, account changes, payments, messages, or deletion.
  • Validate tool arguments before execution and tool results before using them as evidence.
  • Treat website content, retrieved text, uploaded files, and tool results as untrusted input.
  • Log model version, instruction version, retrieved source IDs, tool calls, approvals, guardrails, eval results, and final actions.
  • Use sandboxed environments for computer/browser tools and code execution.
  • Apply data retention, redaction, access-control, and audit requirements before sending data to any model provider.

Provider placement matrix

Layer → provider surface mapping

Use this matrix after classifying the workflow layer. It maps architecture concepts to provider-specific API surfaces.

Layer OpenAI Anthropic Gemini Vertex / internal
Instruction Responses instructions / Agent instructions. System prompt / Messages API configuration. system_instruction / API config. Vertex system instructions / policy config.
Runtime Responses input. Messages API user content. GenerateContent / Interactions input. Request payload.
State Previous response / agent session strategy. Application-managed conversation state. Application/session-managed state. DB/session/workflow engine.
Files File Search / vector stores / files. Files API / document inputs. Files API. GCS / object storage / document store.
Retrieval File Search. Citations, document inputs, custom retrieval. File Search / URL Context. Vertex RAG Engine / Vertex AI Search.
Tools Built-in tools, function calling, remote MCP. Client tools, server tools. Function calling, built-in tools. Service adapters, internal APIs, MCP/tools.
Computer/browser Computer Use. Computer Use. Computer Use. Sandbox, VM, browser harness.
Orchestration Agents SDK, handoffs, guardrails, tracing. Application-managed agent loop. Application-managed workflow / Interactions where appropriate. Workflow engine / agent graph.
Approval Guardrails and human review pattern. Application approval layer. Application approval layer. Policy engine / human review queue.
Structured output Structured Outputs. Structured outputs; verify compatibility with citations. Structured output. Schema validation.
Evals Evals / trace grading. Console evaluation / app evals. Application evals. Vertex Gen AI Evaluation / internal evals.
Observability Tracing / logs. Application logs / tool traces. Application logs. Cloud logs / audit logs / traces.
Caching Prompt caching. Prompt caching. Explicit context caching. Cache layer.
Batch Batch API. Message Batches API. Batch API. Async jobs / queues.
Embeddings Embeddings. Embeddings. Embeddings. Vector DB / semantic search.
Security Guardrails, approvals, sandboxing, provider controls. Tool boundaries, data retention controls, app-side governance. Safety filters, system instructions, app-side controls. IAM, VPC-SC, CMEK, logging, audit, DLP, policy engine.

Misplacement guardrails

What not to put only in API prompts

Prompt-only control is not enough for production AI systems.

  • Do not put secrets, API keys, tokens, passwords, or privileged credentials in prompts, instructions, files, schemas, logs, or eval datasets.
  • Do not let the model be the authority for permissions, identity, billing, subscription state, production state, or irreversible actions.
  • Do not rely on instructions alone to enforce security boundaries. Use application-side authorization, validation, logging, and approval controls.
  • Do not treat retrieved content, website content, uploaded files, or tool results as trusted instructions.
  • Do not execute tool calls without validating arguments, permissions, rate limits, expected side effects, and user approval requirements.
  • Do not confuse structured output with factual correctness or source alignment.
  • Do not confuse caching with memory, retrieval, state, or source governance.
  • Do not fine-tune before testing whether better instructions, retrieval, tools, structured outputs, or evals solve the problem.
  • Do not use batch APIs for interactive user workflows that require immediate feedback.
  • Do not use computer/browser tools without sandboxing, high-impact action approvals, and audit logging.

Official source check

Official API references used for this mapping

Use these references to verify terminology and feature boundaries before updating this page again.