Advanced
Section 30: Security Deep Dive
Hermes Agent operates with real tool access — terminal execution, file writes, web requests, messaging platform connections. This capability makes it genuinely useful and genuinely dangerous when misused or attacked. The advanced security model addresses the OWASP LLM Top 10 threat categories relevant to agents with tool access.
The primary attack vector unique to LLM agents is indirect prompt injection: an attacker
embeds instructions in content that Hermes reads — a web page, a file, an email — that
cause Hermes to take actions the user did not authorize. For example, a malicious webpage
might include hidden text instructing Hermes to exfiltrate ~/.hermes/config.yml to an
attacker-controlled URL.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% All colors are color-blind friendly and meet WCAG AA contrast standards
graph TD
USER["Legitimate user"]:::teal
HERMES["Hermes Agent"]:::blue
WEB["External web content<br/>(untrusted)"]:::orange
ATTACKER["Attacker<br/>(injected instructions)"]:::purple
INJECTION["Indirect prompt injection<br/>(hidden instructions in fetched content)"]:::purple
DEFENSE1["Defense 1: SSRF protection<br/>(block private IP ranges)"]:::teal
DEFENSE2["Defense 2: Command approval<br/>(user reviews before execution)"]:::teal
DEFENSE3["Defense 3: Network egress isolation<br/>(allowlist outbound destinations)"]:::teal
DEFENSE4["Defense 4: Injection-aware system prompt<br/>(skepticism about fetched content)"]:::teal
USER -->|"task: fetch and summarize"| HERMES
HERMES -->|"fetch URL"| WEB
WEB -->|"content + hidden injection"| HERMES
ATTACKER -->|"embeds instructions"| WEB
INJECTION --> DEFENSE1
INJECTION --> DEFENSE2
INJECTION --> DEFENSE3
INJECTION --> DEFENSE4
classDef teal fill:#029E73,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef blue fill:#0173B2,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef orange fill:#DE8F05,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef purple fill:#CC78BC,stroke:#000000,color:#FFFFFF,stroke-width:2px
# ~/.hermes/config.yml — comprehensive security configuration
security:
# --- Command Approval ---
command_approval:
interactive # => Never run commands without user confirmation
# => In automated pipelines, use a dedicated restricted config
# --- Secret Redaction ---
secret_redaction: true
redaction_patterns: # => Additional patterns beyond defaults
- "ghp_[A-Za-z0-9]{36}" # => GitHub personal access tokens
- "sk-[A-Za-z0-9]{48}" # => OpenAI API keys
- "AKIA[A-Z0-9]{16}" # => AWS access key IDs
- "xoxb-[0-9]+-[A-Za-z0-9]+" # => Slack bot tokens
# --- SSRF Protection ---
ssrf_protection: true
ssrf_blocklist: # => Block requests to these CIDRs in addition to defaults
- "10.0.0.0/8" # => Private class A
- "172.16.0.0/12" # => Private class B
- "192.168.0.0/16" # => Private class C
- "169.254.0.0/16" # => Link-local (AWS metadata endpoint)
- "fd00::/8" # => IPv6 ULA
# --- Network Egress Isolation ---
network_egress:
mode:
allowlist # => Options: open | allowlist | deny
# => open: all outbound connections permitted (default)
# => allowlist: only listed destinations permitted
# => deny: no outbound connections from tool calls
allowed_hosts:
- "api.anthropic.com" # => LLM API
- "api.openai.com" # => Alternative LLM API
- "*.github.com" # => GitHub API and web
- "registry.npmjs.org" # => npm registry (for MCP server installs)
# Any host not listed is blocked when mode: allowlist
# --- Prompt Injection Defense ---
prompt_injection_defense: true
injection_defense_level:
strict # => Options: basic | strict
# => basic: system prompt reminder to distrust content
# => strict: adds sandboxing markers around fetched content
# => and instructs LLM to treat as untrusted data, not instructions
# --- MCP Supply Chain ---
mcp_vetting:
require_explicit_approval: true # => Never connect to an MCP server not listed in config
audit_tool_definitions:
true # => Log all tool definitions received from MCP servers
# => Review the log before trusting a new serverMCP supply chain risk is a specific concern. An MCP server you connect to can define tools with misleading names and descriptions that cause the LLM to call them in contexts you did not intend. Before adding any community MCP server, read its tool definitions, review its npm dependency tree for unusual transitive dependencies, and run it in Docker isolation rather than directly on your machine.
Link-preview exfiltration is a subtler attack. Some messaging platforms auto-generate
previews for URLs, causing the platform to fetch URLs that appear in messages. An attacker
who can observe Hermes's output (via a shared channel, for example) can exfiltrate data
by having Hermes include a URL containing encoded data in its response, then observing the
platform's preview fetch in their server logs. Mitigation is to disable link previews in
channels where Hermes operates and to use network_egress: allowlist mode.
Key Takeaway: The four primary defenses — SSRF protection, command approval, network egress allowlisting, and strict prompt injection defense — form a layered perimeter; MCP supply chain vetting and link-preview exfiltration awareness address attack vectors specific to agent architectures.
Why It Matters: An AI agent with terminal and network access deployed without egress controls is a lateral movement vector in any network it can reach. The configuration above implements defense in depth: each layer catches attacks that slip past the previous one.
Section 31: Voice Mode
Hermes Agent includes a voice interface with ten text-to-speech (TTS) providers and five speech-to-text (STT) providers. Voice mode enables push-to-talk interaction: hold a key to record speech, release to transcribe and send, receive a spoken response. This is particularly useful for mobile contexts (Android Termux), hands-free server administration, and accessibility use cases.
# ~/.hermes/config.yml — voice mode configuration
voice:
enabled: true
# --- Speech-to-Text (STT) ---
stt:
provider:
groq # => Options: local | groq | openai | mistral | xai
# => local: runs Whisper locally (no API cost, slower)
# => groq: fast cloud transcription (~$0.0001/min)
# => openai: OpenAI Whisper API
# => mistral: Mistral STT API
# => xai: xAI STT API
api_key: ${GROQ_API_KEY} # => Required for non-local providers
language: en # => Transcription language hint (improves accuracy)
model: whisper-large-v3-turbo # => Provider-specific model (Groq)
# --- Text-to-Speech (TTS) ---
tts:
provider:
edge_tts # => Options: edge_tts | elevenlabs | openai | minimax |
# => voxtral | gemini | xai | neutts | kittentts | piper
# => edge_tts: free, Microsoft Edge TTS (no API key)
# => elevenlabs: highest quality, most expensive
# => openai: OpenAI TTS, good quality
# => piper: local TTS, no API, runs on device
voice:
en-US-AriaNeural # => Voice ID (provider-specific)
# => edge_tts voices: run `edge-tts --list-voices`
speed: 1.0 # => Playback speed (1.0 = normal, 1.3 = faster)
api_key: null # => Not required for edge_tts
# --- Push-to-Talk ---
push_to_talk:
key: ctrl+space # => Hold to record, release to send
noise_threshold: 0.02 # => Minimum amplitude to start recording (reduce background noise)
silence_timeout_ms: 1500 # => Auto-stop recording after 1.5s of silenceTTS provider comparison by use case:
| Provider | Quality | Cost | Latency | Requires API | Best For |
|---|---|---|---|---|---|
| piper | Good | Free | Low | No | Offline / air-gapped |
| edge_tts | Good | Free | Medium | No | Default starting point |
| openai | High | $0.015/1K | Low | Yes | Production quality |
| elevenlabs | Highest | $0.18/1K | Low | Yes | Highest fidelity |
| minimax | High | Low | Low | Yes | Cost-efficient quality |
STT provider comparison:
| Provider | Quality | Cost | Latency | Best For |
|---|---|---|---|---|
| local | Good | Free | High | Privacy-sensitive / offline |
| groq | High | ~$0.0001/min | Lowest | Fast, cheap cloud transcription |
| openai | High | $0.006/min | Low | Existing OpenAI integration |
| mistral | Good | Low | Low | Mistral ecosystem |
Key Takeaway: Voice mode provides push-to-talk interaction through configurable STT
and TTS providers; edge_tts (free, no API key) and groq STT are the lowest-friction
starting combination.
Why It Matters: Voice interaction removes the keyboard barrier for mobile and hands-free contexts — server rooms, mobile SSH sessions, accessibility requirements. The pluggable provider model means you can use free local models in offline environments and switch to higher-quality cloud providers when quality is the priority.
Section 32: Docker Terminal Backend
The Docker terminal backend runs all Hermes bash tool calls inside an isolated container, using a shared or mounted filesystem to exchange files between the host and the container. This provides the strongest isolation available without requiring a remote host or cloud provider. It is the recommended backend for working with untrusted repositories, running experiments that might damage the local environment, or providing a reproducible execution context across team members.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% All colors are color-blind friendly and meet WCAG AA contrast standards
graph LR
HERMES["Hermes Agent<br/>(host process)"]:::blue
CONTAINER["Docker Container<br/>(isolated execution)"]:::orange
WORKSPACE["/workspace<br/>(mounted volume)"]:::teal
HOST_FS["Host filesystem<br/>(selected directories only)"]:::purple
NETWORK_NONE["Network: none<br/>(optional air-gap)"]:::brown
HERMES -->|"bash tool call"| CONTAINER
CONTAINER -->|"reads / writes"| WORKSPACE
WORKSPACE -->|"volume mount"| HOST_FS
CONTAINER -.->|"blocked"| NETWORK_NONE
classDef blue fill:#0173B2,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef orange fill:#DE8F05,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef teal fill:#029E73,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef purple fill:#CC78BC,stroke:#000000,color:#FFFFFF,stroke-width:2px
classDef brown fill:#CA9161,stroke:#000000,color:#FFFFFF,stroke-width:2px
# ~/.hermes/config.yml — Docker terminal backend (full configuration)
terminal:
backend: docker
docker:
image:
python:3.12-slim # => Base image; choose to match your workload
# => python:3.12-slim for Python work
# => node:24-slim for Node.js work
# => ubuntu:24.04 for general shell work
volumes:
- "${HOME}/projects:/workspace" # => Mount host projects directory as /workspace in container
# => Container code sees /workspace; host sees ~/projects
- "/tmp/hermes-cache:/cache" # => Optional: persistent cache across container restarts
network:
bridge # => Options: none | bridge | host
# => none: fully air-gapped (most secure)
# => bridge: internet access via Docker NAT (default)
# => host: same network as host (least isolated)
resource_limits:
memory: 2g # => Container hard memory cap (OOM-killed if exceeded)
cpus: 2.0 # => Maximum CPU cores (soft limit)
pids: 512 # => Maximum process count (prevents fork bombs)
user:
1000:1000 # => Run container as non-root UID:GID
# => Prevents container root from writing root-owned files
# => to mounted host volumes
remove_after_session:
true # => Delete container and its writable layer on session end
# => Prevents accumulation of stale containers
persistent_container:
false # => If true, reuse the same container across sessions
# => Faster startup, but state accumulates between sessions
# => Set true for development containers with heavy deps
extra_args: # => Raw Docker flags passed to `docker run`
- "--read-only" # => Container filesystem is read-only (writes only to volumes)
- "--security-opt=no-new-privileges"
# => Prevent privilege escalation inside container
- "--cap-drop=ALL" # => Drop all Linux capabilities (most restrictive)
- "--cap-add=DAC_OVERRIDE" # => Re-add only what is neededFor reproducible team environments, commit the config.yml Docker section to your
repository without secrets. Team members pull the config and get the same container
image, volume mounts, and resource limits. Differences in host OS (Linux vs. macOS)
are absorbed by Docker.
A common pitfall with remove_after_session: true is losing installed dependencies.
If your container installs Python packages during a session, they are gone when the
container is removed. Use a Dockerfile with dependencies pre-installed and build a
custom image rather than relying on runtime installation in a disposable container.
Key Takeaway: The Docker backend runs all bash tool calls in an isolated container
with configurable volume mounts, network mode, resource limits, and security options;
--cap-drop=ALL, non-root user, and network: none together provide the strongest
isolation available without a remote host.
Why It Matters: Untrusted repositories and experimental scripts that might damage the local environment are a real risk. Running Hermes in Docker mode means a runaway script or malicious skill can only damage the container — the host filesystem is protected by the volume mount boundary.
Section 33: Advanced Subagent Patterns
Basic subagent delegation (Intermediate Section 19) covers spawning and waiting for results. Advanced patterns address orchestration structures that arise in production: the orchestrator-specialist hierarchy, result aggregation strategies, failure handling across concurrent agents, and cost attribution.
The orchestrator-specialist pattern uses the parent Hermes agent as a pure coordinator that decomposes tasks and interprets results, while specialized subagents do the actual work with minimal toolsets. This separation keeps each agent's context clean and focused.
# Example orchestrator task (what you send to the parent Hermes):
# "Audit the three microservices in ./services/ for security issues.
# Check auth, input validation, and dependency vulnerabilities in each."
# Parent Hermes decomposes this into three concurrent subagent calls:
# Subagent 1: Auth audit for services/auth-service/
# Parameters:
# task: "Audit auth-service for authentication issues:
# - Check all endpoints for missing auth guards
# - Verify JWT validation: expiry, signature, algorithm pinning
# - Check for hardcoded credentials"
# toolset: [terminal] # => Read-only terminal access
# working_directory: ./services/auth-service
# model: cheap_model # => Cheaper model for systematic pattern matching
# Subagent 2: Input validation audit for services/payment-service/
# Parameters:
# task: "Audit payment-service for input validation issues:
# - Find all SQL queries; verify parameterization
# - Find all user inputs; verify sanitization
# - Check for path traversal in file operations"
# toolset: [terminal]
# working_directory: ./services/payment-service
# model: cheap_model
# Subagent 3: Dependency audit across all services
# Parameters:
# task: "Run npm audit in each service directory, collect results,
# identify critical and high vulnerabilities, group by CVE"
# toolset: [terminal]
# model: primary # => Primary model for reasoning across resultsFailure handling across concurrent subagents requires explicit configuration. By default,
if one subagent fails or times out, the parent waits for the remaining agents and reports
the failure alongside successful results. For tasks where partial results are unacceptable,
configure fail_fast:
# ~/.hermes/config.yml — advanced delegation configuration
delegation:
max_concurrent: 3
default_timeout_seconds: 300
failure_handling:
mode:
continue # => Options: continue | fail_fast
# => continue: collect all results, report failures inline
# => fail_fast: kill all agents on first failure
retry_failed:
false # => Do not automatically retry failed subagents
# => Set true + retry_limit: 2 for transient failures
cost_attribution:
enabled: true # => Track cost per subagent separately
label_by: task_name # => Attribute cost to the task name, not just "delegation"Result aggregation strategy determines how the parent synthesizes subagent output. Three patterns emerge in practice:
- Merge: Combine outputs into a unified report (appropriate for audits, searches).
- Vote: Take the majority position when subagents give conflicting answers (for MoA-style verification tasks).
- Pipeline: Pass one subagent's output as the next subagent's input (sequential processing with intermediate agents).
# Example: pipeline aggregation
# Stage 1 subagent: Fetch and parse raw data
# => Output: structured JSON with 500 entries
# Stage 2 subagent (receives stage 1 output):
# Input: structured JSON from stage 1
# Task: "Filter entries where revenue > 50000 and status == 'active', compute totals"
# => Output: filtered dataset, 87 entries, total revenue $5.2M
# Stage 3 subagent (receives stage 2 output):
# Input: filtered dataset from stage 2
# Task: "Format as an executive summary: top 5 accounts, trends, recommendations"
# => Output: formatted report
# Parent returns stage 3 output to the userKey Takeaway: The orchestrator-specialist pattern keeps each agent's context focused;
failure handling modes (continue vs. fail_fast) and result aggregation strategies
(merge, vote, pipeline) are the key design decisions for production multi-agent workflows.
Why It Matters: Multi-agent parallelism is the primary technique for reducing wall-clock time on decomposable production tasks. Getting failure handling right prevents a single flaky subagent from silently corrupting a merged result or causing unnecessary full restarts.
Section 34: Custom Tool Development
Hermes Agent supports custom tools written as Python modules. A custom tool extends the
BaseTool class, defines its input schema using JSON Schema, implements an execute method,
and is registered in a custom toolset. Custom tools appear in /tools listing and are
available to the LLM alongside built-in tools.
# ~/.hermes/custom_tools/jira_tool.py
# Custom tool: query Jira issues from within Hermes sessions
from hermes.tools import BaseTool, ToolResult # => Import base class and result type
from typing import Optional
import httpx # => HTTP client for Jira REST API
class JiraQueryTool(BaseTool):
"""Query Jira issues by project key and status filter."""
# => Docstring becomes the tool description
# => shown to the LLM in the tool schema
name = "jira_query" # => Tool name in function-calling schema
toolset = "jira" # => Toolset this tool belongs to
# JSON Schema defining the tool's input parameters
# => The LLM reads this schema to understand what arguments to provide
input_schema = {
"type": "object",
"properties": {
"project_key": {
"type": "string", # => e.g., "PROJ"
"description": "Jira project key (e.g., 'PROJ', 'BACKEND')"
},
"status": {
"type": "string",
"enum": ["Open", "In Progress", "Done", "Blocked"],
# => Restrict to valid Jira statuses
"description": "Filter issues by status"
},
"max_results": {
"type": "integer",
"default": 10, # => Return 10 results unless specified
"description": "Maximum number of issues to return (default: 10)"
}
},
"required": ["project_key"] # => Only project_key is required
}
async def execute(
self,
project_key: str,
status: Optional[str] = None,
max_results: int = 10
) -> ToolResult:
"""Execute the Jira query and return formatted results."""
# Build JQL query
jql = f"project = {project_key}" # => Base JQL: all issues in project
if status:
jql += f" AND status = '{status}'" # => Append status filter if provided
jql += " ORDER BY updated DESC" # => Most recently updated first
# Make Jira REST API call
async with httpx.AsyncClient() as client:
response = await client.get(
f"{self.config.jira_url}/rest/api/3/search",
# => self.config reads from config.yml custom section
params={
"jql": jql,
"maxResults": max_results,
"fields": "summary,status,assignee,priority"
},
headers={
"Authorization": f"Bearer {self.config.jira_api_token}"
# => Token from config; never hardcoded
}
)
if response.status_code != 200:
# Return error result — LLM receives this and can report or retry
return ToolResult.error(
f"Jira API error {response.status_code}: {response.text}"
)
issues = response.json()["issues"] # => Parse response JSON
# Format results as readable text for the LLM
if not issues:
return ToolResult.success("No issues found matching the query.")
lines = [f"Found {len(issues)} issues in {project_key}:\n"]
for issue in issues:
fields = issue["fields"]
lines.append(
f"- {issue['key']}: {fields['summary']} "
f"[{fields['status']['name']}]"
f"{' → ' + fields['assignee']['displayName'] if fields.get('assignee') else ''}"
)
return ToolResult.success("\n".join(lines))# ~/.hermes/config.yml — register the custom toolset
tools:
enabled:
- terminal
- web
- jira # => Enable the custom jira toolset
custom_toolsets:
jira: # => Toolset name (must match tool.toolset attribute)
module_path:
~/.hermes/custom_tools/jira_tool.py
# => Path to the Python module defining the tool
config:
jira_url:
https://myorg.atlassian.net
# => Config values passed to self.config in the tool
jira_api_token:
${JIRA_API_TOKEN}
# => API token from Jira account settingsTest a custom tool locally before enabling it in a session:
# Test the tool without starting a full Hermes session
hermes tools test --tool jira_query --input '{"project_key": "BACKEND", "status": "In Progress"}'
# => Executes the tool's execute() method directly
# => Prints the ToolResult to stdout
# => Useful for debugging without burning LLM tokensKey Takeaway: Custom tools extend BaseTool, define a JSON Schema for inputs, and
implement an async execute method; they register under a custom toolset name in
config.yml and appear in /tools alongside built-in tools.
Why It Matters: Generic tools cover common use cases. Domain-specific tools — querying your ticketing system, calling internal APIs, interfacing with proprietary data stores — make Hermes genuinely useful for your specific workflow. The tool API is stable and straightforward enough that a custom tool is typically a few hours of work.
Section 35: Production Deployment
Running Hermes Agent reliably in production means converting it from an interactive CLI into a managed background service. This section covers systemd (Linux) and launchd (macOS) service configuration, health monitoring, cost budgets, and alerting.
A production Hermes deployment typically runs the messaging gateway as a persistent service and the cron scheduler as a companion process. The CLI remains available for interactive sessions but is not part of the production service.
# /etc/systemd/system/hermes-gateway.service
# systemd unit file for the Hermes messaging gateway
[Unit]
Description=Hermes Agent Messaging Gateway
After=network-online.target # => Wait for network before starting
Wants=network-online.target
StartLimitIntervalSec=60 # => Rate-limit restarts: max 3 in 60 seconds
StartLimitBurst=3
[Service]
Type=simple
User=hermes # => Run as dedicated non-root user
Group=hermes
WorkingDirectory=/home/hermes # => Working directory for the process
# Load environment variables from a secrets file (not checked into version control)
EnvironmentFile=/etc/hermes/secrets.env # => Contains ANTHROPIC_API_KEY=sk-... etc.
ExecStart=/home/hermes/.local/bin/hermes gateway
# => Start the gateway (blocks; systemd manages lifecycle)
Restart=on-failure # => Restart only on non-zero exit (not on clean stop)
RestartSec=10 # => Wait 10 seconds before restarting
# Resource limits at OS level (defense-in-depth alongside config.yml limits)
LimitNOFILE=65536 # => Maximum open file descriptors
MemoryMax=1G # => Hard memory cap via cgroup
# Harden the service process
NoNewPrivileges=true # => Prevent privilege escalation
ProtectSystem=strict # => /usr, /boot, /etc are read-only
ProtectHome=false # => Allow access to /home/hermes (needed for config)
PrivateTmp=true # => Isolated /tmp per service instance
[Install]
WantedBy=multi-user.target# Deploy and start the systemd service
sudo systemctl daemon-reload # => Reload systemd unit files
sudo systemctl enable hermes-gateway # => Enable auto-start on boot
sudo systemctl start hermes-gateway # => Start the service now
# Check service status
sudo systemctl status hermes-gateway
# => Active: active (running) since 2026-05-22 09:00:01 WIB
# Tail service logs (systemd journal)
sudo journalctl -u hermes-gateway -f # => Follow logs in real time<!-- ~/Library/LaunchAgents/dev.hermes.gateway.plist -->
<!-- launchd plist for macOS production deployment -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>dev.hermes.gateway</string> <!-- Unique service identifier -->
<key>ProgramArguments</key>
<array>
<string>/Users/me/.local/bin/hermes</string>
<string>gateway</string> <!-- Run gateway subcommand -->
</array>
<key>EnvironmentVariables</key>
<dict>
<key>ANTHROPIC_API_KEY</key>
<string>sk-ant-...</string> <!-- Production API key -->
</dict>
<key>RunAtLoad</key>
<true/> <!-- Start when user logs in -->
<key>KeepAlive</key>
<true/> <!-- Restart if process exits -->
<key>StandardOutPath</key>
<string>/tmp/hermes-gateway.log</string>
<key>StandardErrorPath</key>
<string>/tmp/hermes-gateway-error.log</string>
</dict>
</plist># Load the launchd plist (macOS)
launchctl load ~/Library/LaunchAgents/dev.hermes.gateway.plist
# Check status
launchctl list | grep hermesCost budget alerting in production:
# ~/.hermes/config.yml — production cost alerting
cost:
daily_budget_usd: 10.00 # => Alert threshold
budget_action: warn # => warn | block
alert_channel: telegram # => Send budget alert via Telegram
alert_recipient: 123456789 # => Your Telegram user IDKey Takeaway: Production Hermes runs as a systemd (Linux) or launchd (macOS) service with a dedicated non-root user, hardened process isolation, environment-file secrets management, and cost budget alerting via the messaging gateway.
Why It Matters: An agent that requires manual restart after every server reboot, runs as root, or stores API keys in the config file is not production-ready. The service configuration above reflects the baseline operational standards that make a long-running agent trustworthy.
Section 36: Reinforcement Learning Toolset
The rl toolset integrates Hermes Agent with reinforcement learning workflows — logging
task outcomes as reward signals, querying training history, and exporting datasets for
model fine-tuning. This toolset is the mechanism through which Hermes's behavior can be
systematically improved beyond the heuristic skill-creation loop.
The RL toolset does not train models inline — it collects data. The training loop runs externally (using a framework of your choice) against exported datasets. Hermes generates the data; your ML pipeline consumes it.
# ~/.hermes/config.yml — RL toolset configuration
tools:
enabled:
- rl # => Enable the RL toolset
rl:
storage_path: ~/.hermes/rl_data/ # => Where reward logs and datasets are stored
auto_log:
true # => Automatically log all task completions as episodes
# => Each session becomes an episode with implicit reward
reward_model:
binary # => Options: binary | scalar | comparative
# => binary: success/failure (1 or 0)
# => scalar: numeric reward (0.0 to 1.0)
# => comparative: A/B comparison between runs# Log a reward signal for a completed task (from within a session)
# => Records the current session as a successful episode with reward 1.0
/tools rl.log_reward --reward 1.0 --note "Deployment completed without errors"
# Log a negative reward for a failed task
/tools rl.log_reward --reward 0.0 --note "Deployment failed: TypeScript errors not caught"
# Query recent reward history
/tools rl.history --limit 10
# => Output:
# EPISODE REWARD DATE NOTE
# ses_20260522_143012_a7b3 1.0 2026-05-22 14:30 Deployment completed without errors
# ses_20260521_091823_c4d2 0.0 2026-05-21 09:18 Deployment failed: TypeScript errors
# ses_20260520_171244_e5f1 1.0 2026-05-20 17:12 Security audit: 3 issues found
# Export collected episodes as a dataset for fine-tuning
hermes rl export --format jsonl --output ./training_data.jsonl
# => Exports all episodes with reward labels to JSONL format
# => Each line: {"messages": [...], "reward": 1.0, "metadata": {...}}
# => Compatible with OpenAI fine-tuning API and most ML frameworksThe exported dataset contains the full conversation — system prompt, user messages, agent responses, tool calls, and tool results — alongside the reward label. This enables supervised fine-tuning that reinforces successful patterns and demotes unsuccessful ones.
For teams with the resources to run fine-tuning, the RL toolset provides a systematic data collection pipeline. For most teams, the RL toolset is most useful as a structured outcome log that informs manual skill editing rather than automated training.
Key Takeaway: The RL toolset logs task outcomes as reward signals and exports them as training datasets compatible with standard ML fine-tuning pipelines — enabling systematic behavior improvement beyond the heuristic skill-creation loop.
Why It Matters: The skill system improves Hermes within the bounds of its current model. RL-based fine-tuning changes the model itself, compounding improvements at a deeper level. For teams doing repetitive high-stakes tasks, the data collection investment pays off in measurably better agent behavior over time.
Section 37: Advanced Memory Architecture
Memory in Hermes Agent has two layers: the human-readable markdown files (MEMORY.md and
USER.md) and the SQLite-backed FTS5 session store. Understanding both layers — their
schemas, their interaction, and their failure modes — enables effective pruning, backup,
and recovery strategies for long-running production deployments.
The SQLite database lives at ~/.hermes/sessions.db. Its schema has three primary tables:
-- sessions table: one row per Hermes session
CREATE TABLE sessions (
id TEXT PRIMARY KEY, -- Session ID: ses_YYYYMMDD_HHMMSS_xxxx
started_at TEXT NOT NULL, -- ISO 8601 timestamp
ended_at TEXT, -- NULL if session is still active
model TEXT NOT NULL, -- Primary model used
total_tokens INTEGER DEFAULT 0, -- Cumulative token count
total_cost_usd REAL DEFAULT 0.0, -- Cumulative cost in USD
metadata TEXT -- JSON blob for extensible metadata
);
-- messages table: one row per message in any session
CREATE TABLE messages (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL REFERENCES sessions(id),
role TEXT NOT NULL, -- 'user' | 'assistant' | 'tool'
content TEXT NOT NULL, -- Full message text (indexed by FTS5)
tool_name TEXT, -- Non-null for role='tool'
tokens INTEGER, -- Token count for this message
created_at TEXT NOT NULL
);
-- FTS5 virtual table: full-text search index over message content
CREATE VIRTUAL TABLE messages_fts USING fts5(
content, -- Indexed column (message text)
content='messages', -- Backing table
content_rowid='rowid' -- Row linkage
);Memory pruning strategies for large session stores:
# Check database size
du -sh ~/.hermes/sessions.db
# => 2.3G
# Count total sessions
hermes db stats
# => Sessions: 1,847
# => Messages: 84,211
# => Database size: 2.3 GB
# => Oldest session: 2025-11-01
# Archive sessions older than 180 days
# => Moves rows to sessions_archive table, removes from main tables and FTS index
hermes db archive --before 180d
# => Archived 412 sessions (2025-11-01 to 2026-02-22)
# => Database size after: 1.1 GB
# Export archived sessions to a compressed file before deleting
hermes db export --archived --output ~/hermes-archive-2026-02.jsonl.gz
# => Exports archived sessions as compressed JSONL
# Delete archived sessions (irreversible)
hermes db purge --archived --confirm
# => Deleted 412 archived sessions
# => Database size after: 0.9 GB
# Rebuild FTS5 index (run after bulk operations to ensure consistency)
hermes db reindex
# => Rebuilds messages_fts from messages table
# => Takes 30-60 seconds for large databases
# Backup the database
cp ~/.hermes/sessions.db ~/backup/hermes-sessions-$(date +%Y%m%d).db
# => SQLite files are safe to copy when Hermes is not running
# => For running Hermes: use hermes db backup --output ~/backup/MEMORY.md and USER.md pruning is manual. As these files grow, older or superseded
entries accumulate. Review them periodically and remove entries that no longer reflect
current project state or preferences. Hermes does not automatically remove entries —
it only appends.
Key Takeaway: The SQLite session store uses FTS5 for search across a three-table schema
(sessions, messages, messages_fts); hermes db archive, export, and purge manage
growth; MEMORY.md and USER.md require periodic manual review and pruning.
Why It Matters: A 2+ GB session database with no pruning strategy is a support incident waiting to happen. The archive-export-purge cycle keeps the database at a manageable size while preserving the ability to recover historical context from compressed exports if needed.
Section 38: Custom LLM Provider
Beyond the built-in provider adapters (Anthropic, OpenAI, Google, DeepSeek) and OpenRouter,
Hermes supports custom LLM providers through a plugin interface. A custom provider
implements the BaseLLMProvider class, handles authentication and request formatting for
a proprietary or self-hosted endpoint, and registers in config.yml.
# ~/.hermes/custom_providers/vllm_provider.py
# Custom provider: connect to a self-hosted vLLM instance
from hermes.llm import BaseLLMProvider, LLMResponse, LLMMessage
from typing import AsyncIterator
import httpx
class VLLMProvider(BaseLLMProvider):
"""Provider for self-hosted vLLM OpenAI-compatible endpoint."""
name = "vllm" # => Provider name used in config.yml
def __init__(self, config: dict):
self.base_url = config["base_url"] # => e.g., http://gpu-server:8000
self.model = config["model"] # => vLLM model name
self.api_key = config.get("api_key", "none")
# => vLLM can run without auth
async def complete(
self,
messages: list[LLMMessage],
stream: bool = True,
**kwargs
) -> AsyncIterator[LLMResponse]:
"""Send messages to vLLM and stream the response."""
# vLLM exposes an OpenAI-compatible /v1/chat/completions endpoint
payload = {
"model": self.model,
"messages": [{"role": m.role, "content": m.content} for m in messages],
# => Convert LLMMessage to OpenAI format
"stream": stream,
"temperature": kwargs.get("temperature", 0.7)
}
async with httpx.AsyncClient(timeout=60.0) as client:
async with client.stream(
"POST",
f"{self.base_url}/v1/chat/completions",
json=payload,
headers={"Authorization": f"Bearer {self.api_key}"}
) as response:
async for line in response.aiter_lines():
if line.startswith("data: ") and line != "data: [DONE]":
chunk = line[6:] # => Strip "data: " prefix
yield LLMResponse.from_openai_chunk(chunk)
# => Convert OpenAI chunk format to Hermes format
def count_tokens(self, text: str) -> int:
"""Estimate token count for cost tracking."""
# => Approximate: 4 characters per token (adjust per model)
return len(text) // 4# ~/.hermes/config.yml — register and use the custom provider
llm:
primary: llama-3.1-70b # => Model name as vLLM knows it
provider: vllm # => Selects the custom vllm provider
cheap_model: llama-3.1-8b
custom_providers:
vllm:
module_path:
~/.hermes/custom_providers/vllm_provider.py
# => Path to the provider Python module
config:
base_url:
http://gpu-server.internal:8000
# => Self-hosted vLLM endpoint
model: llama-3.1-70b-instruct # => Exact model ID as loaded in vLLM
api_key: null # => No auth on internal network (not recommended for prod)For truly air-gapped environments, a vLLM or Ollama custom provider enables Hermes to
operate with zero external API calls. Combined with the Docker terminal backend using
network: none and local STT/TTS providers, this gives a fully offline Hermes deployment.
Key Takeaway: Custom LLM providers implement BaseLLMProvider with a complete method
and count_tokens method, register via config.yml, and enable Hermes to work with
self-hosted or proprietary LLM endpoints.
Why It Matters: Data residency requirements, cost structures, and air-gap security policies all drive the need for self-hosted LLM deployment. A custom provider integration makes Hermes viable in these environments without forking the project.
Section 39: Migrating from OpenClaw Deep Dive
hermes claw migrate is a purpose-built migration tool that reads your OpenClaw
installation and produces a Hermes configuration with equivalent settings, memory, skills,
and messaging platform credentials. The Beginner section covered the command overview;
this section covers the internals, directory mapping, format conversion, and edge cases.
The migrator reads from ~/.openclaw/ and writes to ~/.hermes/. It operates in three
modes: --dry-run (inspect, no writes), --preset user-data (no secrets), and
--preset full (includes secrets). The default is --preset user-data.
# Run a detailed dry-run to see exactly what will happen
hermes claw migrate --dry-run --verbose
# => OUTPUT (example):
# Reading OpenClaw installation at: ~/.openclaw/
# Configuration mapping:
# openclaw.llm.model → hermes.llm.primary
# openclaw.llm.cheap_model → hermes.llm.cheap_model
# openclaw.tools.enabled → hermes.tools.enabled (toolset names remapped)
# openclaw.security.auto_approve → hermes.security.command_approval: auto
# openclaw.memory.file → hermes.memory.memory_file
# Memory migration:
# ~/.openclaw/MEMORY.md (4.2 KB, 87 lines)
# → ~/.hermes/MEMORY.md (direct copy, no conversion needed)
# ~/.openclaw/USER.md (1.1 KB, 23 lines)
# → ~/.hermes/USER.md (direct copy)
# Skills migration (3 skills found):
# ~/.openclaw/skills/deploy.yml (OpenClaw format)
# → ~/.hermes/skills/deploy.yml (converted to Hermes format)
# DIFF: OpenClaw 'steps[].cmd' renamed to 'steps[].command'
# OpenClaw 'steps[].check' renamed to 'steps[].verify'
# ~/.openclaw/skills/test-workflow.yml → ~/.hermes/skills/test-workflow.yml
# ~/.openclaw/skills/db-backup.yml → ~/.hermes/skills/db-backup.yml
# Platform credentials (preset: user-data — SKIPPED):
# ~/.openclaw/config.yml: telegram.token → SKIPPED (use --preset full to include)
# ~/.openclaw/config.yml: discord.token → SKIPPED
# Sessions migration:
# OpenClaw session format: JSON files in ~/.openclaw/sessions/
# Hermes session format: SQLite at ~/.hermes/sessions.db
# 312 session files found → will be imported to SQLite and FTS5-indexed
# NOTE: Session import may take 2-3 minutes for 312 sessions
# Estimated result:
# ~/.hermes/config.yml (new file)
# ~/.hermes/MEMORY.md (new file, direct copy)
# ~/.hermes/USER.md (new file, direct copy)
# ~/.hermes/skills/ (3 converted skill files)
# ~/.hermes/sessions.db (new database, 312 imported sessions)The skill format conversion handles the key differences between OpenClaw and Hermes skill
schemas. OpenClaw uses cmd for commands and check for verification steps; Hermes uses
command and verify. The converter renames fields and adds the level field (defaulting
to frequent) for all migrated skills.
Session import converts OpenClaw's flat JSON file format into Hermes's SQLite schema. The FTS5 index is built over all imported sessions after import, making historical sessions searchable immediately.
# Run the actual migration (user-data preset)
hermes claw migrate --preset user-data
# => Writes config, memory, skills
# => Imports sessions to SQLite (takes 2-3 minutes for large session histories)
# => Does NOT copy API keys
# After migration: add API keys manually to config.yml
# => Open ~/.hermes/config.yml and fill in the api_key fields
# => Reference environment variables: api_key: ${ANTHROPIC_API_KEY}
# Verify the migration
hermes --verify-config
# => Checks config.yml for syntax errors
# => Tests LLM provider connectivity (requires API key)
# => Checks that all referenced skill files exist
# => Reports any issues before you start your first session
# Start your first Hermes session post-migration
hermes
# => Sessions, memory, and skills from OpenClaw are immediately availableKey Takeaway: hermes claw migrate --dry-run --verbose shows a complete migration plan
including field mappings, skill format conversions, and session import scope before writing
anything; --preset user-data applies the migration without copying API keys.
Why It Matters: Migration tools are only valuable if they are safe to run experimentally. The dry-run mode with verbose output makes the migration transparent and reversible — you understand exactly what will change before committing to it.
Section 40: Contributing to Hermes Agent
Hermes Agent is MIT-licensed and maintained by Nous Research with an active open-source community. Contributing a tool, a bug fix, a skill, or a documentation improvement follows a standard GitHub PR workflow. Understanding the codebase structure and local development setup reduces the friction of a first contribution.
The codebase is organized as a Python monorepo:
hermes-agent/
├── hermes/ # Main Python package
│ ├── cli/ # TUI and CLI entry points (prompt_toolkit)
│ ├── gateway/ # Messaging platform adapters (Telegram, Discord, etc.)
│ ├── tools/ # Built-in toolset implementations
│ │ ├── terminal.py # bash, read, write tools
│ │ ├── web.py # search, fetch tools
│ │ ├── memory.py # MEMORY.md / USER.md tools
│ │ ├── skills.py # skill CRUD tools
│ │ └── ... # one file per toolset
│ ├── llm/ # LLM provider adapters
│ │ ├── base.py # BaseLLMProvider abstract class
│ │ ├── anthropic.py # Anthropic adapter
│ │ ├── openai.py # OpenAI adapter
│ │ └── ...
│ ├── memory/ # MEMORY.md and USER.md management
│ ├── skills/ # Skill store and progressive disclosure
│ ├── sessions/ # SQLite session store and FTS5 search
│ ├── delegation/ # Subagent spawning and coordination
│ └── security/ # Command approval, SSRF, secret redaction
├── tests/ # Test suite (pytest)
│ ├── unit/ # Fast unit tests (no LLM API, no network)
│ ├── integration/ # Integration tests (real LLM API, mocked tools)
│ └── e2e/ # End-to-end tests (full session)
├── scripts/
│ └── install.sh # One-line installer
└── pyproject.toml # Poetry configuration, dependencies, tool config
# Set up a local development environment
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
# Install Poetry (Python dependency manager) if not present
curl -sSL https://install.python-poetry.org | python3 -
# Install all dependencies including dev dependencies
poetry install --with dev
# => Creates a virtual environment and installs all deps
# Run the test suite
poetry run pytest tests/unit/ # => Fast unit tests only (no API calls)
poetry run pytest tests/integration/ # => Integration tests (requires LLM API key)
# Run Hermes from source (development mode)
poetry run hermes --version # => Should show dev version
# Run linting
poetry run ruff check hermes/ # => Fast Python linter
poetry run mypy hermes/ # => Type checking
# Run all quality checks (same as CI)
poetry run make checkContributing a new tool:
- Create
hermes/tools/mytool.pyimplementingBaseToolsubclasses. - Register the toolset in
hermes/tools/__init__.py. - Add unit tests in
tests/unit/tools/test_mytool.py. - Add documentation in
docs/tools/mytool.md. - Open a PR targeting the
mainbranch.
The PR template asks for: a description of what the tool does, why it belongs in core rather than as a custom tool, test coverage, and any security considerations (does it make network requests? does it write files? does it execute code?).
For skills contributions, the Skills Hub accepts PR submissions directly to the
skills-hub branch of the repository. Community skills go through a lightweight review:
the skill must have a summary, must specify a level, and must not contain hardcoded
secrets or URLs.
Key Takeaway: Hermes Agent's Python monorepo uses Poetry for dependency management and
pytest for testing; new tools follow the BaseTool pattern, include unit tests, and go
through a standard GitHub PR review focused on functionality and security considerations.
Why It Matters: Open-source AI agent infrastructure is only as good as its contributor community. Understanding the contribution path lowers the barrier for teams to upstream their custom tools, making the ecosystem richer for everyone while reducing the maintenance burden of carrying internal forks.
Last updated May 21, 2026