CortexAgent Framework
Setup Wizard

Configure your agent

A guided, step-by-step setup. Advanced sections are collapsed — expand only what you need.

Welcome to the Setup Wizard

This guided setup will walk you through configuring your Cortex agent step by step.

What you'll configure
1. Agent identity — name and purpose
2. LLM provider — which model powers your agent
3. Tool servers — MCP integrations for capabilities
4. Task types — define what your agent can do
5. Storage & persistence — backend, quotas, history
6. Adaptive behavior — capability scout, learning, validation, blueprints
7. Runtime & delivery — timeouts, concurrency, chat UI, sandbox, security
8. Publish mode — how you want to deploy your agent
9. Review & finish — save config and get started
Before you start — make sure you have an API key for your chosen LLM provider set as an environment variable.

Agent identity

Give your agent a name and describe what it does.

A short, descriptive name. Used as the session identity and referenced in logs.
Used by the LLM to understand the agent's purpose and decompose user requests.
Advanced — synthesis guidance, streaming, clarification
Synthesis
Free-form instructions injected into the final answer composition step. Use this to steer tone, formatting, or citation style without touching individual tasks.
Performance
Start executing ready tasks as soon as the LLM emits them instead of waiting for the entire plan. Reduces time-to-first-result but slightly increases overall token usage.
Concurrency extras
Ceiling on LLM calls a single MCP sub-agent may make for one request. Prevents a runaway sub-agent from draining your budget.
Streaming events (SSE)
Publish live progress events over SSE so clients can render a progress UI while the agent works.
Attach the task name and short summary to each status event. Off means clients only see "step 3 of 7" style progress.
Re-emit progress events from nested MCP agents. Turn on if another agent calls this one and you want its status to reach the end user.
Number of past SSE events kept so a reconnecting client can replay what it missed.
Throttle: don't flush more than one event per this many milliseconds. Protects slow clients from event storms.
Clarification
Let the agent pause and ask the user a question when a request is ambiguous, instead of guessing. Requires a UI that can relay questions back.
Capability Scout, Learning, Validation, and Blueprints now live in the Adaptive behavior step later in the wizard.

LLM provider

Choose the language model that will power your agent's reasoning.

The env var name where your API key is stored (e.g. ANTHROPIC_API_KEY). Do not paste the key itself.
Maximum completion tokens per LLM call.
Sampling temperature. Lower = more deterministic.
Advanced — custom headers, Bedrock/Azure credentials
Custom headers
Added to every LLM request. Useful for proxy gateways that require routing headers.
AWS Bedrock
Env var holding the AWS region (e.g. us-east-1).
Leave blank to use the default AWS credential chain (IAM role, SSO, etc.).
Only needed when using static keys instead of the credential chain.
Required when using STS/temporary credentials.
Azure AI
Env var holding the Azure AI Foundry endpoint URL.
Azure API version string used in the request query.
Additional named providers
Define extra providers that individual task types can target via llm_provider:. Use this to mix cheap/fast models with the default.

Tool servers (MCP)

Connect external tools via the Model Context Protocol. Optional — you can add these later.

Tool servers give your agent capabilities like web search, file access, database queries, and more. Each server exposes tools your agent can call during task execution. Expand Advanced inside a server to configure auth, TLS, pooling, and health checks.

Task types

Define the kinds of tasks your agent can handle. Optional — the agent can auto-discover tasks from its tools.

Task types tell the agent how to decompose user requests. Each type maps to a capability (LLM synthesis, web search, bash, sandboxed code execution, document/image generation) and produces output in a chosen format. Expand Advanced on a task for retries, dependencies, validation notes, and output schemas.

Storage & persistence

Where the agent keeps its state — session data, uploaded files, session history, and disk quotas.

Root directory for all agent-managed files — session state, blueprints, uploaded attachments, auto-discovered MCP cache. Relative paths resolve against the agent process's working directory.
In-memory state is lost on restart and can't be shared across processes. Pick SQLite for a single-instance agent, Redis when you plan to run multiple replicas.
Advanced storage & quotas — file thresholds, quotas, disk-health limits
Task outputs larger than this are written to disk as files instead of returned inline in the result envelope. Prevents huge payloads from bloating the LLM context.
Hard cap on the inline JSON envelope size for a single task result. Payloads exceeding this are offloaded to a file and referenced by path.
Per-session disk quota. New outputs are rejected once a session exceeds this total — old sessions are not auto-deleted, they're just capped.
Delete session files by renaming them into a trash directory in one atomic move, then sweeping that. Turn off on exotic filesystems that reject cross-directory renames (some FUSE mounts).
Emit a warning in the logs when free disk on the storage volume drops below this threshold. Sessions still run.
Refuse to start new sessions once free disk drops below this — protects against corrupting an in-flight session when the disk fills up.
Advanced SQLite — WAL mode, connection timeout, TTLs
These only apply when the storage backend is set to SQLite.
Write-Ahead Logging — readers and writers don't block each other, so throughput is much higher under concurrent load. Safe default. Turn off only for filesystems that forbid it (rare — some network mounts).
SQLite "busy timeout" — how long a write will wait for a concurrent lock before failing. Raise this if you see database is locked errors under load.
Active in-flight session state expires after this many seconds of inactivity. Safety net for abandoned sessions.
Lookup metadata for a session is kept this long even after full state is gone — lets history queries still resolve the session id.
Advanced Redis — auth, TLS, pool, TTLs, key prefix
These only apply when the storage backend is set to Redis.
Logical DB index (0–15) on the Redis instance. Use a non-zero DB to keep agent keys away from other apps sharing the server.
Redis 6+ ACL username. Leave blank to use the default user (the one authenticated by a plain password).
Name of the env var that holds the password. Never paste the password itself — it would get written into cortex.yaml.
TLS
Connect using TLS. Required by managed Redis providers (AWS ElastiCache in-transit encryption, Redis Cloud, Upstash, etc.).
Validate the Redis server's TLS certificate against your CA bundle. Turn off only for self-signed dev setups.
Optional custom CA bundle — only needed if your Redis uses a private CA that isn't in the system trust store.
For mutual TLS (mTLS) — path to the client certificate the agent presents to Redis.
Private key matching the client certificate above. File permissions should be 0600.
Connection pool
Upper bound on concurrent TCP connections to Redis. Size this roughly alongside your Max concurrent sessions in the Runtime step.
Keep at least this many connections warm even when idle — avoids cold-start latency on the first request after quiet periods.
Timeout for establishing a new Redis TCP connection. Bump this if Redis is far away (cross-region) or slow to accept connections.
Per-command read timeout on an already-established connection. Slow commands beyond this abort.
Key namespace & TTLs
All keys written by this agent are namespaced under {prefix}:…. Lets multiple agents share a Redis instance without colliding.
TTL applied to active session-state keys. Expired keys are cleaned up by Redis automatically.
TTL on index rows that map a session id to its state location — outlives the state TTL so history queries can still resolve stale sessions.
How long published streaming events stay replayable for late-joining subscribers.
File input — uploaded attachment limits
Largest single attachment a user may upload to a session. Uploads above this size are rejected before any LLM call.
Whitelist of accepted upload content types. Leave blank to use the safe default set (plain text, PDF, Office docs, common images, audio).
Session history
Persist the conversation across sessions so the agent can recall past exchanges ("what did we decide about X last week?"). Uses the storage backend configured above.
Advanced history — retention, context window, encryption
These only apply when Session history is enabled above.
Sessions older than this are pruned on the next maintenance pass. Set higher for compliance workflows that need long retention.
How many of the user's most recent sessions are summarized back into the prompt when answering a follow-up. Higher = more continuity, more tokens.
By default history only stores the final answer text. List specific task type names here to also keep their raw structured outputs queryable later.
Build a search index over past sessions so users can query "what did we decide about X?" in natural language. Small extra storage cost.
Encrypt history records with a key from the env var below. Off stores them as plaintext inside the storage backend (no disk-level protection).
Env var holding a base64-encoded 32-byte key (AES-256-GCM). Losing this env var makes previously-encrypted history unreadable — back it up safely.

Adaptive behavior

Features that let the agent discover capabilities, improve over time, and guard its own output quality.

Capability Scout
Before planning, probe each connected MCP tool server to learn what tools it exposes. The planner can then use capabilities you didn't explicitly list. Off freezes the agent to only the capabilities you hard-code in task types.
Probe at most this many tools per request. Prevents planning-context bloat on servers that expose hundreds of tools — the planner stays focused on the most relevant ones.
Give up scouting after this many seconds. Scout is best-effort — if probing takes too long, planning continues with whatever was learned so far.
Allow the scout to query public MCP registries (Smithery, PulseMCP, Glama, mcp.so) to propose new tool servers the agent could connect to. Off keeps the agent fully offline — it won't reach out to any registry.
Where discovered MCP records are cached so we don't keep hitting the registries. Relative paths resolve against storage.base_path.
Curated pool of public MCP registries the scout may query, tried in order until one answers. Leave blank to use the defaults.
Cap on how many brand-new external MCPs may be registered during one user session. Prevents runaway discovery on very broad requests.
Re-verify a previously-discovered server whose last verification is older than this. Keeps stale/broken servers out of planning.
Per-registry HTTP request timeout. One slow registry won't block the others — the scout just skips it and moves on.
Response validation (wave gates)
After each "wave" of tasks, score the outputs against their validation_notes before they reach the user. Low-scoring waves are either flagged or rejected depending on the thresholds below.
Named provider from llm_access.providers used to run validation scoring. Leave as default unless you want a cheaper/faster model to judge — a typical setup is a small Haiku or Gemini Flash judging a bigger main model.
Advanced validation — thresholds, weights, reporting
These only apply when Response validation is enabled above.
Minimum overall score (0–1) required for a wave output to be accepted. Below this the agent retries or flags the result as low-quality.
Below this score the response is rejected outright instead of merely flagged — a strong signal that something is badly wrong.
Max time the LLM judge may take per task output. If the judge times out the task is treated as unvalidated (not failed).
Score weights (must sum to 1.0)
Weight for "did this actually answer the user's request?" — usually the dominant signal.
Weight for "did it cover every subtask that was asked for?"
Weight for internal consistency and fluency — catches self-contradictions and garbled output.
Attach the validator's written critique to the response the user sees. Off keeps it internal (still written to logs for debugging).
Show the 0–1 score to the user. Off by default — raw scores out of context tend to be noisy and misleading.
Learning engine
Auto-discover new task types and refinements from usage patterns. After consented runs the engine proposes config "deltas" (new task types, better validation notes, clarifications) that you can review and apply.
Advanced learning — auto-apply thresholds, notifications
These only apply when the Learning engine is enabled above.
When the learning engine proposes a change, apply it automatically if confidence meets the bar below, instead of waiting for human review. Use with caution in production.
Minimum confidence level the engine must assign to a proposal before it's eligible for auto-apply.
The same proposal must appear this many times across different runs before it's auto-applied. Guards against one-off flukes.
Emit a structured log event whenever a delta is applied, so you can audit what changed and roll back if needed.
Task blueprints
Per-task markdown guidance (dos, don'ts, clarifications, lessons learned) that the agent reads before each run and updates afterwards with user consent. Leave off to skip blueprints entirely.
Advanced blueprints — prompt injection cap, staleness
Cap on how much of a blueprint is injected into the system prompt. Large blueprints are truncated to this many characters to keep the prompt bounded.
Blueprints untouched for longer than this are flagged stale — the LLM is told to re-discover subtasks rather than blindly follow outdated guidance.

Runtime & delivery

Operational limits, boot-time behavior, and the surfaces that deliver the agent to end users (chat UI, code sandbox, per-user overrides).

Session & timeout settings
Max wall-clock time a single user session may run before being forcefully terminated. Covers the full plan-execute-synthesize cycle.
Default timeout applied to an individual task. Can be overridden per-task-type back in Step 4.
Concurrency limits
Global cap on simultaneous sessions across all users. New requests above the cap get rejected with a 429 response.
Per-user limit — prevents one user from monopolizing capacity by opening many parallel sessions.
Within a single session, how many tasks may execute simultaneously. Higher = faster but more expensive (more parallel LLM and tool calls).
Upper limit on tasks in one session's task graph. Protects against runaway decompositions that balloon into hundreds of tasks.
Chat UI — publish a web chat frontend (text + file upload, history)
Enables cortex publish ui: a clean chat interface with streaming responses, file attachments, and per-user session history. History is stored via the history backend configured in the Storage step. File uploads are validated against the File input MIME and size settings there.
When on, cortex publish ui will serve the UI on the host/port below.
Bind address. Use 0.0.0.0 to accept connections from any interface, 127.0.0.1 to restrict to the local machine.
HTTP port the chat UI listens on.
Shown in the browser tab and the sidebar header of the chat UI.
none assigns each browser an anonymous id so history persists per-device. token / basic gate the whole UI behind a shared credential.
Clients must send Authorization: Bearer <token>. Only used when auth mode is token.
Shared username for HTTP Basic auth. Only used when auth mode is basic.
Shared password for HTTP Basic auth. Sent over the wire on every request — serve the UI over TLS.
Security — input cap, secret scrubbing
Hard cap on user-supplied input tokens per request. Rejects prompt-bomb attacks that try to blow out the context window or drive up cost.
Regex patterns matched against logs and LLM inputs; matches are replaced with [REDACTED] before the text leaves the process. Leave blank to use the sensible default set.
Startup & discovery — boot behavior, eager discovery, capability cache
Fail fast at boot if any configured MCP server is unreachable. Off means the agent starts in a degraded state and retries the missing servers on demand.
Probe every tool server at boot so the first user request doesn't pay discovery latency. Slower startup, faster first response — good for long-running production agents.
Print every discovered tool name at boot. Useful during setup and when debugging capability issues; chatty in production.
Make a test call against each tool server's credentials before marking it healthy. Catches bad tokens at boot instead of on the first user request.
Overall time budget for boot-time discovery across all tool servers. Servers that don't respond in time are marked degraded and retried later.
Max parallel probes when discovery runs in the background (i.e. eager discovery is off). Higher finishes faster but hits your tool servers harder.
Where the persisted capability cache lives so the agent doesn't re-discover every boot. Leave blank to use the default location under storage.base_path.
User configuration — per-user overrides
Let end users plug in their own Cortex-branded MCP servers at runtime (agents built with this framework). Safe default — they run in the same trust zone as your agent.
Let end users register arbitrary third-party MCP servers. Off by default — turning this on means any user can attach any tool, which is powerful but broadens the trust boundary.
Sandboxed code execution — Python sandbox for code-exec tasks
Allow task types with capability_hint: code_exec to run generated Python inside a sandboxed subprocess. Off disables all code execution regardless of task config.
Max wall-clock time for a single sandboxed code run. The subprocess is killed when this elapses.
Grant the sandbox outbound network access. Off blocks all sockets — strongly recommended unless a specific task genuinely needs to fetch data from the web.
After a successful execution, prompt the developer before saving the generated script to disk. Off means scripts are discarded silently once the run finishes.
When the developer agrees to save a script, also register it as a new task type in cortex.yaml automatically. Off leaves the YAML edit for you to do manually.
Ant Colony — self-spawning specialist agent mesh
Allow this agent to spawn specialist ant agents — independent Cortex agents running as MCP servers (trust_tier: ant). Ants fill capability gaps automatically and are supervised for automatic restarts.
When CapabilityScout finds a gap that neither internal servers nor external discovery can fill, automatically hatch a new ant agent for that capability. Off means ants are only spawned via cortex ants hatch or the framework API.
First port to try when allocating a port for a new ant. Ports are scanned upward from here.
Maximum number of ants that may run simultaneously. New hatches fail once this limit is reached.
The colony supervisor watches ant processes and restarts them automatically on crash. Off means crashed ants remain stopped until manually re-hatched.
Ant LLM profile
Provider alias for ants (must match a key in llm_access). Use default to inherit the parent's default provider.
Model for ant agents. Haiku is recommended — ants are specialists, not orchestrators.
Environment variable containing the API key for ant agents.

Publish mode

Choose how you want to deploy and run your agent.

📦
Docker
Containerized deployment. Best for production, CI/CD, and cloud hosting.
🐍
Python package
Build a distributable pip-installable wheel. Share via PyPI or internal registry.
🔗
MCP server
Expose as a tool server. Other agents can call this agent's capabilities.
💬
Chat UI
A clean web chat frontend — text + file upload, streaming, persistent history per user.

Review & finish

Review your generated configuration, then save and publish.

Generating configuration...