Step-by-Step OpenClaw Tutorial 2026: Zero to Hero Deployment Guide
Tutorials

Step-by-Step OpenClaw Tutorial 2026: Zero to Hero Deployment Guide

Sarah Jenkins

By Sarah Jenkins

OpenClaw: From Beginner to Master

OpenClaw is an open-source, self-driven AI agent. This book combines best practices to provide a full-lifecycle guide from initial setup to application deployment, while deeply deconstructing its underlying operational mechanisms and implementation principles.

Key Features

  • Practical Orientation: Build a minimum viable loop from scratch with ready-to-use configuration templates.
  • Mechanism Analysis: In-depth breakdown of core components including Gateway, Agent Loop, tool systems, sessions, and memory.
  • Production Ready: Focuses on reliability, security hardening, runtime monitoring, and troubleshooting.

5-Minute Quick Start

New to OpenClaw? Experience it in three steps:

  1. Install (1 min): curl -fsSL https://openclaw.ai/install.sh | bash
  2. Initialize (2 mins): Run openclaw and follow the wizard to configure your API keys.
  3. Chat (2 mins): Type "Hello" in WebChat. Success is confirmed once the AI replies! 🎉

Recommended Reading

This book is part of an AI technical series. The following titles provide complementary knowledge:

Book TitleRelationship to This Book
AI for BeginnersFoundational AI knowledge for those without a technical background.
Prompt Engineering GuideTheoretical basis for designing effective agent prompts.
Context Engineering GuideManaging agent context and memory architecture design.
Claude Technical GuideClaude's MCP protocol, tool use, and Agentic Coding.
Agentic AI Definitive GuideGeneral agent architectures and multi-agent collaboration patterns.
AI Security Definitive GuideSecurity design and defense practices for agent systems.
LLM Internals & ArchitectureDeep dive into the logic and structure of Large Language Models.

Contribution & Feedback

Issues and Pull Requests are welcome, especially regarding: typo corrections, broken link fixes, practical case studies, and reusable templates.

Chapter 1: Understanding OpenClaw

Welcome to the world of OpenClaw. If you have ever used ChatGPT or Claude, you might have been amazed by the intelligence of Large Language Models (LLMs). However, when you try to get them to "actually get the job done"—such as automatically checking emails, organizing data, and sending it to a Lark group—you quickly realize there is a missing link: a system framework that connects AI with real-world software.

OpenClaw is exactly that: an Agent project designed to bridge the gap between LLMs and the real world, allowing AI to complete complex tasks for you automatically and securely.

As the opening of this book, this chapter aims to build a clear and systematic "cognitive map," outlining the core problems OpenClaw solves and defining its system boundaries. Through this chapter, you will establish a comprehensive understanding of OpenClaw.


Chapter Roadmap

This chapter includes the following sections:

  • 1.1 The Evolution of LLMs and the Birth of OpenClaw: Understand what OpenClaw can do for you. Explore the engineering pain points encountered when deploying LLMs in real business scenarios and how OpenClaw resolves them.
  • 1.2 Architecture and Core Concepts: A diagram-based breakdown of OpenClaw’s "Four-Layer Architecture" and its "Five Core Objects" (Gateway, Agent, Node, Tool, Session).
  • 1.3 Comparison with Other Solutions: Compare the fundamental differences between OpenClaw and ChatGPT, Cursor, or traditional automation tools (like Zapier) to help you choose the right tool for the job.
  • 1.4 Boundaries and Risk Awareness: What OpenClaw is suitable for, what it is not, the security and operational risks involved, and the often-underestimated issue of Token costs.
  • 1.5 Typical Potential Use Cases: A look at the ecosystem’s capabilities, featuring "hardcore" practical ideas such as personal ChatOps, home automation gateways, and security experimental platforms.
  • 1.6 Chapter Summary: A quick knowledge checklist and self-test questions to verify if you are "fully prepared" to move on to the next chapter’s hands-on installation.

Learning Objectives

Upon completing this chapter, you will be able to:

  1. Define System Boundaries: State accurately what OpenClaw is and identify its most (and least) suitable use cases.
  2. Unify Terminology: Understand the definitions of core concepts like Gateway, Agent, Tool, and Session to eliminate communication barriers during future configuration and tuning.
  3. Establish a Global Perspective: Understand the journey of a request—from entering the system to retrieving tool operation results—and the modules and collaborative steps involved.
  4. If you are ready, let's officially Begin!

1.1 Evolution of LLMs and the Birth of OpenClaw

This section explores the inevitable trend of Large Language Models (LLMs) evolving into Agents, analyzes the complex engineering pain points encountered during implementation, and elucidates OpenClaw’s positioning as a "task operation domain," its applicable boundaries, and its core architectural logic.

1.1.1 What is OpenClaw?

Imagine you’ve hired an intern who is exceptionally brilliant—capable of researching data, drafting documents, and organizing information—and is available 24/7.

OpenClaw is that intern, except it lives inside your computer. More precisely: OpenClaw is a self-driven Agent that can be installed on local machines or servers, allowing it to access and utilize various tools (e.g., calendars, emails, chat windows).

It can assist you with:

  • Automated Daily Reports: Extracting key progress from project groups and generating formatted daily reports to specific documents.
  • Research & Reporting: Accessing multiple webpages to extract information and organizing it into structured tables with data sources.
  • Actionable Chat: Mentioning ("@") the bot in group chats to have it summarize received PDFs into reports and post them back to the group.
  • What challenges arise when implementing such an agent? Let’s break down the systemic difficulties faced during LLM deployment and how OpenClaw resolves them.

1.1.2 Project Origin and Naming Evolution

The birth of OpenClaw is inseparable from the efforts of founder Peter Steinberger (also the founder of PSPDFKit). The project initially launched under the name Clawdbot, later changing to Moltbot following community feedback and positioning adjustments. However, in early 2026, the project faced a period of turbulence: it received a severe trademark compliance warning, followed closely by the hijacking of its official X (Twitter) account. Under extreme pressure, the community held a global vote, ultimately establishing the name OpenClaw.

This "trial by fire" served as a catalyst for community cohesion. The transparent governance and collaboration during the renaming crisis laid a foundation of trust for subsequent explosive growth—as of now, the project's GitHub Star count has surpassed many veteran open-source projects, making it one of the fastest-growing in its field.


1.1.3 The Inevitability of LLM Evolution and Engineering Pain Points

As the capabilities of LLMs leap forward, interaction paradigms are undergoing profound changes. Moving from simple Q&A bots to assisted driving systems, and finally to Agents that autonomously plan and execute long-process tasks, system complexity is rising exponentially. When Agents truly take over business workflows, traditional architectures become inadequate, facing three core challenges:

  • State Management Explosion: In multi-step complex tasks (e.g., querying, analyzing, and calling write interfaces), multi-turn dialogues and tool calls generate massive amounts of "context fragments." For business logic requiring dozens of API calls, returned data can easily exceed model context limits, causing the loss of early instructions and leading to factual errors or behavioral "hallucinations." For instance, an Agent might ignore a "Do not delete data" instruction after several rounds of dialogue and accidentally clear a user's mailbox.
  • Loss of Control over Permissions and Boundaries: Granting LLMs the ability to call underlying business APIs introduces significant security risks. Without robust interception mechanisms, unpredictable model outputs can translate into destructive operations, such as erroneously triggering database write commands or accessing sensitive data across tenants. Fine-grained control over instruction permissions across different channels is a major hurdle for production-grade deployment.
  • Lack of Observability and Degradation Capabilities: Real-world business is full of uncontrollable factors like network jitters or interface timeouts. When a long-process task gets stuck, a lack of clear operation logs, retry mechanisms, and fallback strategies turns the service into an un-debuggable "black box." Developers find it difficult to locate the error node and cannot effectively intervene manually.

1.1.4 Core Positioning and Resolution Mechanisms

In traditional business architectures, integrating LLM capabilities often requires weaving extensive logic into core code to handle complex context splicing, API callback parsing, and process exceptions. This disrupts business lines. OpenClaw positions itself as an independent "Runtime Domain," decoupling business logic from Agent scheduling. It actively takes over the heavy lifting of state assembly, model scheduling, and retry/fallback logic, providing a foundation for private deployment, multi-channel access, and support for multiple Agent types.

OpenClaw offers specific features to address the three major engineering pain points:

  • Guaranteeing Completion via Independent State Machines: To combat state explosion, the engine features built-in state transition and recovery mechanisms. When facing context overflow or transient network anomalies, the engine utilizes automated context pruning/summarization and stepped retry strategies to ensure the completion of the main task.
  • Guaranteeing Controllability via Configuration Sandboxes: To prevent boundary loss, tool calls must execute within a strictly defined Profile strategy sandbox. Any attempt to exceed these boundaries (including high-risk instructions resulting from hallucinations) is intercepted by the application gateway, ensuring business security.
  • Guaranteeing Maintainability via Operation Stack Tracing: For observability, the platform provides a perspective deep into the Agent's operation stack. Every thought process, scheduling branch, and specific log is transparent. Upon task failure, the system preserves the state and throws a callback log to support auditing and troubleshooting.
  • Notably, while OpenClaw's design stems from cutting-edge academic exploration (such as "Observe-and-Act" collaboration loops), as an infrastructure framework close to production, its evolution prioritizes how the system gracefully handles fallbacks, blocks anomalies, and supports degradation.

1.1.5 Use Cases and Capability Boundaries

When introducing any new architectural component, clarifying its boundaries is vital. OpenClaw has typical use cases as well as specific scenarios where it should be used with caution.

  1. Typical Use Cases
  2. OpenClaw’s architecture is suited for asynchronous tasks with relatively loose state distribution, long operation chains, and inherent fault tolerance:
  • Multi-Channel Consistent Assistants: Providing consistent knowledge and service across multiple endpoints simultaneously.Example: An e-commerce after-sales Agent connected to WhatsApp, Telegram, and a website plugin. It answers policy questions via a private knowledge base and, after assisting the user, automatically calls internal Jira or Zendesk APIs to submit a ticket with a chat summary.
  • Internal Workflow Toolchains with Permission Protection: Connecting internal knowledge bases and infrastructure. Under security policies, it allows authorized users to trigger configuration pulls or audit checks via natural language.Example: A R&D assistant on Slack. If an engineer asks to "check the status of the pre-release environment," the Agent uses GitLab/Kubernetes plugins to summarize logs. However, if the model hallucinates or receives a prompt injection to "delete pods," the OpenClaw gateway blocks the command because the user lacks that specific permission in their Profile.
  • Long-Lifecycle Asynchronous Follow-ups: The system can "suspend" tasks waiting for external dependencies. When the external process completes and triggers a callback, the event gateway wakes the Agent to continue the task.Example: An automated code review Agent. Upon code submission, it triggers a 30-minute third-party security scan. The Agent suspends itself to release resources; 30 minutes later, the scanner returns results via Webhook, OpenClaw wakes the Agent, and it generates a final review suggestion on the PR page.
  1. Scenarios Requiring Caution or Avoidance
  2. Avoid using OpenClaw as the sole authoritative center when faced with strong consistency or extreme real-time constraints:
  • Strong Consistency Core Data Writing: OpenClaw operates on probability and generative feedback. Critical data writes must be validated and finalized by stable, dedicated business services; OpenClaw should only act as the layer initiating the intent request.
  • Millisecond-Level Hard Real-Time Control: Due to LLM inference latency and multi-turn communication overhead, OpenClaw cannot meet the requirements of online businesses needing absolute response times (sub-second), such as industrial automation control.

1.2 Architecture and Core Concepts

This section analyzes the fundamental architecture of OpenClaw, aiming to answer four key questions: How is the system layered? What are the core components? How does a single request flow through the system? And which layer should be inspected first in case of a failure?

1.2.1 Four-Layer Logical Architecture and Three-Plane Physical Form

From the perspective of official product positioning and physical operational boundaries, OpenClaw can be divided into three core planes:

  • Gateway (Long-running Gateway Process): The sole Control Plane. It maintains connections to chat platforms, handles sessions/routing/policies, provides a WebSocket control plane and HTTP API, and hosts Web resources such as the Control UI.
  • Agent Runtime (Embedded Agent Runtime): OpenClaw embeds the Pi SDK, completing model calls, tool loops, streaming output, and session persistence within the same process, eliminating cross-process overhead.
  • External Operation Surface (Skills / Plugins / Nodes): The extension of capabilities. Skills are injected as files, Plugins are registered as TS modules within the gateway, and Nodes connect to remote device capabilities via WebSocket (supporting macOS, iOS, Android, headless, etc., providing features like screenshots, camera access, screen recording, location, and SMS). The Gateway HTTP server also hosts Canvas (Agent-editable HTML/CSS/JS pages) and A2UI (Agent-to-UI) hosts.
  • Based on these physical forms, viewed from a macro-flow perspective, the OpenClaw framework can be further decomposed into four core logical threads:
  • Connection Mechanism: Solving how to ensure heterogeneous platform messages securely and reliably reach the internal network under complex external constraints, completing identity validation and binding.
  • Governance Policy: A role-based system centered on resources and identity, combined with static rules to constrain engine behavior and strictly control unauthorized operation.
  • Operation Kernel: Serving as the "processing workshop" where LLMs and tools collaborate; it handles dynamic prompts and contexts, precisely parses tool returns, and maintains continuous, deep reasoning.
  • Extensible Ecosystem: Defining standardized plugin interfaces to modularize mature tasks, enabling interconnection and ecosystem sharing between different component systems.
  • In engineering implementation, these four threads are abstracted into a Four-Layer Logical Architecture. Each layer has independent responsibilities, facilitating expansion and troubleshooting. To visualize how the Gateway, as a single control point, connects external inputs to the operation foundation, refer to the flow diagram below:
graph TD
User["User (DM/Group Chat)"] <--> |"Messages"| ChatProv["Multi-channel Inbox (Telegram/WhatsApp, etc.)"]
ChatProv <--> |"Network Requests"| Gateway["Gateway Process (WS+HTTP :18789)"]
ControlUI["Control UI / WebChat"] <--> |"WS Direct"| Gateway


subgraph GatewayProcess ["Gateway Process (Control & Operation)"]
    Gateway <--> |"Event Dispatch"| AgentRuntime["Agent Runtime (Embedded Pi SDK Session)"]
    AgentRuntime --> |"Intercept/Validate"| Policy["Tool Policy + Sandbox Policy"]
    
    Policy --> CoreTools["Core Tools (fs/exec/web)"]
    Policy --> BrowserTool["Browser Tool (CDP / Playwright)"]
    Policy --> NodesTool["Nodes Tool (node.invoke)"]
end


subgraph OperationSurface ["External Operation Surface (Capability)"]
    NodesTool <--> |"WS (role: node)"| PairedNode["Paired Node Device (macOS/iOS, etc.)"]
    BrowserTool --> Chrome["Chromium / Remote CDP"]
    CoreTools --> HostEnv["Docker Sandbox / Host Process"]
end
  1. Ingress Layer: Protocol Adaptation and Event Access
  2. The Ingress Layer is the "front line" where OpenClaw meets the outside world, featuring a Multi-channel Inbox. Its primary responsibility is to receive communication events from various channels (WhatsApp, Telegram, Slack, Discord, Signal, BlueBubbles, Microsoft Teams, Matrix, etc.) and convert them into a unified internal event format. This layer masks differences in underlying protocols (Webhooks, WebSockets, or polling), ensuring the upper layers remain agnostic of platform-specific traits.
  3. Control Layer: Global Hub and Security Gatekeeping
  4. The Control Layer is the system's firewall and traffic dispatch center. Every request converted by the Ingress Layer must be inspected here. The Gateway component, built on a unified WebSocket network, is content-agnostic; it only cares about "who is calling," "where is it going," and "what are the permissions." It handles terminal pairing, authentication, rate limiting, and Multi-agent routing.
  5. Operation Layer: Reasoning Engine and Session Driving
  6. Once a request passes the Gateway, it is handled by the Pi Agent Engine running in an RPC-style mode. This is the brain of the system. The core tasks of the Operation Layer include extracting historical context via Sessions, assembling system configurations and current requests into structured prompts, and triggering the LLM reasoning loop. It also manages Docker sandboxing for non-primary channels to prevent unauthorized host operations and handles runtime policies like timeouts and retries.
  7. Capability Layer: Swappable Foundation
  8. The Capability Layer aggregates the infrastructure providing actual productivity. This includes:
  9. LLM Providers: The intelligence source.
  10. Toolsets: Units that perform external actions (e.g., Browser control, Live Canvas, cross-device Nodes).
  11. The model outputs intent, while tools translate that intent into actions and feedback.

1.2.2 Five Core Objects

Corresponding to the four-layer architecture, all abstractions in the system manifest as these five core objects:

  • Gateway: (Control Layer) The entrance gate and routing hub. Handles auth, pairing, and connection management. It does not handle prompts or reasoning logic.
  • Agent: (Operation Layer) More than just an LLM wrapper; it is a complete task operation unit. It defines memory assembly (Session policy), available tools (Tool policy), and default model choices.
  • Node: A unit for business isolation and resource partitioning. Nodes allow different business lines to be scheduled separately to avoid resource contention or permission breaches.
  • Tool: (Capability Layer) Controlled operation units for interacting with external systems, featuring standardized input/output contracts.
  • Session: (Operation Layer) The core mechanism for linking scattered interactions into a stateful dialogue. It maintains historical memory and injects past results into current requests.

1.2.3 Request Flow and Collaboration Logic

To understand how data actually moves through the system, we track the lifecycle of a standard request:

sequenceDiagram
autonumber
actor User as User
participant Gateway
participant Node
participant Agent as Agent Engine
participant Session
participant Model as LLM
participant Tool as Tool


User->>Gateway: Submit message after channel adaptation
Note over Gateway: Permission Point: Auth & Pairing Check
Gateway->>Node: Dispatch request based on routing
Node->>Agent: Invoke target Agent
Agent->>Session: Locate corresponding Session
Session-->>Agent: Extract historical context
Note over Agent,Model: Budget Point: Timeout & Retry Control
Agent->>Model: Submit assembled prompt
Model-->>Agent: Determine action required
Note over Model: Failure Point: Quota Fallback
Agent->>Tool: Call controlled Tool for external action
Tool-->>Agent: Return structured result
Agent->>Model: Re-submit context with tool result
Model-->>Agent: Generate final response
Agent->>Session: Write tool result & final response
Agent->>Gateway: Return processing result
Gateway->>User: Return to user terminal via original path

1.2.4 Layered Troubleshooting and Configuration Cheat Sheet

Because the architecture is strictly decoupled, troubleshooting follows an "outside-in" scanning strategy:

Layer & Core ObjectCore ResponsibilityTypical Failure Symptom & Direction
Ingress (Channels)Adapt protocols to standard eventsNo logs of incoming messages; client "connection dropped." Check: Webhook config, network connectivity.
Control (Gateway/Node)Connection, Global Auth, Routing"Instant Red-Bar Denial" (Unauthorized) without model errors. Check: Pairing files, routing topology, ACLs.
Operation (Agent/Session)Memory, Retries, SandboxingInfinite thinking, blocking conditions, loss of context, OOC (Out of Character). Check: Context length, Agent JSON, Compaction params.
Capability (Tool/Model)External interaction, CompletionNo action response, 429 Rate Limit, unreleased calls. Check: Model quota, Tool metadata, Liveness probes.

1.2.5 Design Philosophy: The π-based Minimalist Foundation

Why can a seemingly simple core (Event + Executor + State Machine) support complex, interruptible, long-running agent systems? OpenClaw is built directly upon the π (pi) minimalist operation skeleton:

  1. Events: The Only Language: In π, all behaviors are events (tool calls, human input, state changes). There is no hidden logic; everything is a structure in a unified event loop. This allows for seamless Human-in-the-loop integration.
  2. Executors: Minimum Unit of Capability: An Executor does one thing: Receive Event → Operate on State → Return Result. Complexity is compressed by explicitly binding event types to specific executors.
  3. State Machine: The Source of Continuity: The framework saves "state," not "intelligence." Because state transitions are explicitly defined, the system can be interrupted and resumed at any time.
  4. "Less is not a lack of capability, but a reduction of assumptions." π focuses only on how events flow, how state changes, and how operation is scheduled.

1.2.6 End-to-End Message Path

flowchart LR
subgraph Inbound ["Ingress & Channels"]
U["User Message"] --> CH["Channels<br/>(WhatsApp/Telegram/etc.)"]
end


subgraph Gateway ["Gateway Control Plane"]
CH --> GW["Gateway<br/>WS + HTTP + Control UI"]
GW --> ROUTE["Routing/Bindings<br/>SessionKey Decision"]
ROUTE --> RUN["Agent Runtime<br/>Prompt Assembly/State Machine"]
RUN --> MODEL["Model Providers<br/>(Strategy-based Fallback)"]
RUN --> TOOL["Tools<br/>(Policy/Sandbox/Approval)"]
RUN --> OUT["Reply Stream"]
RUN --> STORE["Session Store + Transcript<br/>sessions.json + *.jsonl"]
GW --> LOGS["File Logs (JSONL)<br/>Control UI/CLI tail"]
end


OUT --> GW --> CH --> U
TOOL --> RUN

1.3 Comparison with Other Solutions

This section introduces the differences between OpenClaw and its primary alternatives, including conversational AI, assistant-style tools (such as Claude Coworker), and automated workflows.

1.3.1 Comparison with Conversational Tools

Tools like ChatGPT, DeepSeek, and early Claude chat models provide an excellent intelligent Q&A experience for individuals. However, when building enterprise-grade autonomous intelligent systems, they typically face the following limitations:

  • Difference in Interaction Paradigms: These are typical "instruction-following" systems that adopt a Q&A interaction mode where the state dissipates after the session ends. OpenClaw, conversely, is positioned as a "Service-oriented Agent." It can remain resident in the background, actively discover tasks, and execute multi-step composite tasks over long durations—even across multiple days—via triggers like webhooks or scheduled jobs.
  • Deep Integration of Business State: General-purpose chat assistants lack deep integration with the business states of external systems. OpenClaw can directly mount an enterprise's various private toolsets (e.g., querying Jira tickets, reading internal databases) and maintain comprehensive tracking of business context across multiple interactions through its built-in long-term and short-term memory mechanisms.

1.3.2 Comparison with Assistant-style Tools

The concept of an "AI Coworker" is currently on the rise, represented by Cursor and the Claude-based "Coworker" architecture proposed by Anthropic. Their positioning is very close to OpenClaw’s, as both strive to create "digital outsourcing" that integrates into business processes. Despite similar philosophies, key differences exist in their implementation forms:

  • Generality vs. Specific Scenarios: Tools like Cursor are out-of-the-box services deeply optimized for specific vertical domains (like coding). Both Claude Coworker and OpenClaw serve as general-purpose underlying platforms for building various AI Coworkers, allowing users to cultivate professional digital employees for non-fixed business scenarios (such as after-sales support specialists or data analysis assistants).
  • Control Boundaries and Privatization Levels: This is the core differentiator between OpenClaw and most vendor-cloud SaaS (like the closed-source Claude ecosystem). As open-source infrastructure, OpenClaw provides complete private on-premise deployment, featuring fine-grained Gateway permission interception and request sandboxing. For enterprises with stringent data privacy needs and internal network operation security requirements, OpenClaw offers a security baseline and a degree of freedom in customizing the dispatch kernel that far exceeds cloud SaaS.

1.3.3 Comparison with Traditional Automated Workflows

Before the explosion of Agents, enterprises often relied on Zapier, RPA tools, or similar integration platforms to achieve cross-system workflows.

  • Hard Rules vs. Cognitive Reasoning: Traditional workflows rely on strictly hard-coded rules. If they encounter input with a slightly different format or an unexpected error, the process often collapses. OpenClaw, based on the intent understanding and reasoning of LLMs, possesses extremely strong fuzzy processing and generalization capabilities.
  • Dynamic Fault Tolerance and Self-healing: Traditional workflows break easily when encountering network anomalies or target interface rate limits. The OpenClaw engine features built-in production-grade fault tolerance mechanisms based on state machines, such as multi-level retries and prompt-pruning fallbacks, allowing it to dynamically adjust strategies and continue attempts, significantly enhancing the robustness of fully automated operations.

1.3.4 Competitive Landscape and Comprehensive Comparison Table

To provide a more comprehensive view of OpenClaw's position in the industry ecosystem, the following table compares typical representatives of cloud assistants, development frameworks, and automation platforms:

Product/PlatformDeployment FormCore PositioningTool Operation LocationKey Differences (vs. OpenClaw)
OpenAI AssistantsCloud API (Developer Integration)Building agents for apps; supports tool/function callingApp-side or cloud isolated environmentOpenClaw excels in "local operation surface" and multi-entry access without extra client development; OpenAI focuses on cloud API standardization.
LangChain/LangGraphDev Framework (Self-hosted)Code library and orchestration for building agentsCode deployment sideLangChain is a "framework lego" requiring significant code to assemble; OpenClaw is an "out-of-the-box" personal runtime with a ready-made gateway.
n8n / ZapierSelf-hosted / CloudWorkflow automation (including AI-enabled nodes)Workflow node sideTraditional tools win on visual orchestration and massive SaaS integrations; OpenClaw wins on "NLP-based fuzzy reasoning" and deep integration with local resources.
DifyCloud / Self-hostedLow-code AI app building; visual workflows and agent editorsCloud or self-hosted nodesDify excels in the visual frontend experience; OpenClaw excels in local operation, private isolation, and fine-grained permissions.
Coze (ByteDance)Cloud SaaSLow-code agent building; integrated with ByteDance ecosystemCloud isolated environmentCoze is ready-to-use but limited to its cloud ecosystem; OpenClaw offers full privatization, local operation, and enterprise-grade Gateway capabilities, but requires self-maintenance.
AutoGen (Microsoft)Dev Framework (Self-hosted)Multi-agent collaboration; focuses on role-playing and planningCode deployment sideAutoGen excels in multi-agent dialogue and collaborative programming; OpenClaw excels in multi-channel entry for single agents, local tool calling, and long-term memory.
Overall, OpenClaw holds a distinct competitive advantage in the "Local Operation Surface (Files/Processes/Browsers/Nodes) + Unified Chat UI" space, though it remains positioned as an isolated environment platform for individuals or internal networks regarding enterprise RBAC or cloud availability hosting.

1.3.5 Dynamic Evolution of the Competitive Landscape

The comparison table above reflects the landscape at the time of this publication. However, the AI Agent field is moving rapidly. The capability boundaries, deployment options, and pricing models of various products are constantly evolving. Readers are encouraged to visit official project documentation and community discussions for the latest feature benchmarks and user feedback.


1.3.6 Core Summary: Why Choose OpenClaw?

Based on different requirement stages, the selection advice is as follows:

  • For a personal lifestyle or learning assistant: Use ChatGPT or DeepSeek directly.
  • For a ready-made programming helper: Choose domain-specific systems like Cursor.
  • To build a full-featured assistant integrated with business logic within an enterprise intranet, while requiring absolute control over the underlying Agent Loop, the highest level of data privatization, and production-grade fault tolerance: Consider OpenClaw. It is the ideal infrastructure to bridge the gap from a "Q&A chat box" to a "core productivity operation platform."

1.4 Application Boundaries and Risk Awareness

Before deciding to implement OpenClaw, it is essential to establish clear expectations. This section helps you make an informed selection based on four dimensions: "What it's good for," "What it's not good for," "What the risks are," and "Token costs."

1.4.1 What OpenClaw is Good For

OpenClaw’s core competitiveness lies in the trinity of self-hosting + multi-channel access + tool operation. The following scenarios are its optimal strengths:

  • Unified Multi-Channel Entry: Accessing AI via Telegram, WhatsApp, Discord, Slack, Feishu, etc., allowing teams or individuals to dispatch tasks directly within their preferred chat tools.
  • Local Operation and Data Sovereignty: All data and operation processes remain on your own infrastructure, making it ideal for scenarios with strict data privacy requirements (e.g., internal network O&M, sensitive document processing).
  • Long-Running Automated Tasks: Combined with Cron jobs and Webhook triggers, agents can run 24/7 to autonomously perform inspections, data retrieval, and periodic reporting.
  • Complex Tool Orchestration: Features 50+ built-in tools (filesystem, Shell operation, HTTP requests, browser operations, database queries, etc.), supporting flexible invocation and combination during multi-step reasoning.
  • Security Research and Sandboxing: Provides fine-grained isolation mechanisms, making it an ideal experimental platform for Agent security research.

1.4.2 What OpenClaw is Not Good For

No tool is a silver bullet. OpenClaw may not be the optimal choice in these scenarios:

  • Pure Q&A Dialogue: If you only need a chat assistant to answer daily questions, using the ChatGPT, Claude, or DeepSeek clients directly is sufficient without setting up a Gateway.
  • Zero-Ops Managed Services: OpenClaw is a self-hosted project; users are responsible for server maintenance, process daemonization, version upgrades, and troubleshooting. If a team lacks basic O&M capabilities, cloud SaaS (like Anthropic’s Claude API) is a more convenient choice.
  • Large-Scale Enterprise Multi-Tenancy: OpenClaw is currently positioned toward isolated environments for individuals or small teams. For full enterprise-grade RBAC, multi-tenant isolation, and commercial SLAs, evaluate professional enterprise platforms.
  • Latency-Critical Real-Time Systems: Every tool call involves LLM inference, with response latencies usually in seconds. For millisecond-level requirements (e.g., high-frequency trading), traditional rule engines are more appropriate.
  • Simple Automation of Mature Workflows: If task logic is fixed and clear (e.g., "Receive email → Forward to Slack"), tools like Zapier or n8n are more efficient and avoid LLM inference overhead.

1.4.3 The "Coercion Effect" of Agents

It is worth noting the structural impact of Agents on work styles. When peers begin using AI to accept orders 24/7 or automatically plan routes, those who do not use them may be systematically eliminated due to the efficiency gap—this is known as the "Coercion Effect." Self-hosted Agent platforms like OpenClaw are the infrastructure for this trend.

Furthermore, Agents change business logic: they have no emotional preferences and only seek the optimal solution (cost-performance, speed), ignoring ads and visual marketing. This means traditional traffic funnel models may gradually fail. For teams evaluating OpenClaw, this is both an opportunity and a reminder to take post-deployment security and governance seriously.


1.4.4 Risk Awareness

When deploying and using OpenClaw, maintain a clear understanding of the following risks:

Security Risks

OpenClaw empowers AI to execute Shell commands, read/write files, and send messages. This is essentially a mismatch between reasoning capability and operation permissions—current LLM reliability is not yet sufficient for the operational permissions granted. Prompt Injection or misconfiguration could lead to accidental file deletion, sensitive information leakage, or unauthorized external requests. It is vital to:

  • Enable sandbox mode (sandbox.mode: "all" or "non-main") to limit the operation surface.
  • Enable tools.elevated approval mechanisms for high-risk tools, requiring manual confirmation before operation.
  • Strictly control who can talk to the agent via dmPolicy.
  • Model Hallucination Risks
  • LLMs may generate plausible but incorrect outputs. In OpenClaw, hallucinations aren't just "wrong words"—they can lead to "wrong actions," such as executing incorrect database queries. Manual approval steps are recommended for critical business scenarios.
  • Service Availability Risks
  • OpenClaw relies on external model APIs. If the API service is interrupted, rate-limited, or keys expire, the agent will cease to function. It is recommended to configure Failover strategies across multiple Providers.
  • Version Compatibility Risks
  • OpenClaw uses CalVer (e.g., 2026.2.26) with rapid iterations. Upgrades may involve configuration field changes. Verify in a test environment before upgrading production and maintain a rollback plan.

1.4.5 Token Cost: An Easily Underestimated Expense

This is the most common oversight for new users. Every interaction with OpenClaw consumes model API Tokens, and Tokens are real money.

Why OpenClaw Consumes More Tokens Than Standard Chat

In standard ChatGPT/Claude dialogues, Token consumption = User Input + Model Output. In OpenClaw, a seemingly simple request may undergo multiple reasoning loops:

graph LR

A["User Message"] --> B["Prompt Assembly<br/>System Prompt + Tools<br/>+ Context History"]

B --> C["LLM Inference #1"]

C --> D["Tool Call"]

D --> E["Tool Result Injection"]

E --> F["LLM Inference #2"]

F --> G["Final Response"]

Each reasoning round resends the complete system prompt, tool definitions, and context history. A single user message may trigger 2–5 rounds, each consuming thousands of tokens.

Typical Token Consumption Breakdown

ComponentConsumption per RoundDescription
System Prompt500-1000 tokensRole definition and behavioral instructions
Tool Definitions200-500 tokensDescriptions for every mounted tool
Context History500-5000 tokensGrows with dialogue turns; the largest cost source
User Message50-1000 tokensThe actual user input
Model Output100-2000 tokensInference results and tool call parameters
Tool Return Value100-2000 tokensResult of tool operation injected back into context
Using Claude Sonnet 4.6 ($3/1M input, $15/1M output) as an example, an interaction involving 2 tool calls totals roughly 8,000–15,000 tokens, costing about $0.03–$0.10. While seemingly small, frequent interactions can lead to monthly bills of dozens or even hundreds of dollars.

Practical Tips for Cost Control

  • Streamline Toolsets: Mount only the tools needed for the current scenario.
  • Configure Session Resets: Periodically clear context via session.reset to prevent infinite history expansion.
  • Layered Model Usage: Use Haiku 4.5 ($1/1M input) for simple tasks and reserve Sonnet/Opus for complex reasoning via Failover.
  • Enable Prompt Caching: Use Anthropic's Prompt Caching to save up to 90% on input costs for repeating system prompts and tool definitions.
  • Monitor Usage: Set budget alerts on your API Provider panel to avoid unexpected overspending.
Rule of Thumb: If a task can be completed via a traditional script or API call (e.g., scheduled data fetching), don't let the LLM intervene. Use Agents only for parts requiring natural language understanding, fuzzy decision-making, or multi-step reasoning. This is the fundamental principle of token cost control.

1.5 Typical Potential Application Scenarios

Based on OpenClaw's core features—strong isolation, self-controlled infrastructure, and a rich node toolchain—we can derive several high-potential real-world application scenarios. These are areas where pure cloud-based SaaS often struggles to provide full coverage.

1.5.1 Personal ChatOps Hub

For power users or digital professionals, OpenClaw acts as a personal digital nerve center.

Scenario Description: By sending commands via Telegram or WhatsApp, OpenClaw can query calendars, organize emails, summarize notes, or even perform low-frequency O&M (Operations and Maintenance) tasks in the background.

Specific Workflow:

  1. A user sends a message in Telegram: "Summarize my key emails today and let me know about any schedule conflicts."
  2. The OpenClaw Gateway receives the message via Webhook; the Agent Runtime assembles the prompt and invokes the model.
  3. The model plans two steps: (1) Use the email tool to fetch today's mail; (2) Use the calendar tool to retrieve the schedule.
  4. Once both tools finish operation, the model aggregates the results and provides feedback to the user via the Telegram channel.
  5. OpenClaw Features Used: Webhook triggers, Multi-channel access (Telegram Channel), Tool toolchain, multi-step reasoning and tool scheduling in Agent Runtime, and Session memory (optional multi-turn dialogue context).
  6. Advantage: The "Mobile Chat → Home Host Operation" model significantly lowers the interaction barrier while ensuring private data is not overexposed to third-party automation platforms.

1.5.2 Home or Laboratory Automation Gateway

Combining Cron jobs, Webhook triggers, and Node capabilities.

Scenario Description: Deploying the Gateway on a low-power device (like a Raspberry Pi) for constant operation. It can monitor hardware status on a laboratory LAN or pull data at specific times to send to designated groups.

Specific Workflow:

  1. The user configures a Cron expression for "9:00 AM daily" to trigger a periodic agent inspection task.
  2. When the agent wakes up, it uses a local Node to call SSH tools, querying Docker container status, disk usage, and network latency on the Raspberry Pi.
  3. The Agent Runtime organizes the collected data; if an anomaly is detected (e.g., disk 90% full), it automatically generates an alert and triggers email notification logic via a Hook aspect.
  4. For operations requiring manual approval (e.g., restarting a container), the system uses Exec Approval to request user confirmation, preventing accidental misoperations.
  5. OpenClaw Features Used: Cron scheduling, Node resource isolation, SSH/Shell tools, Hook lifecycle aspects, Exec Approval gates, and email channels.
  6. Advantage: This unifies scheduled tasks and local device actions into a single gateway-side schedule. Combined with "escape valves" (Exec Approvals), it achieves a physical closed-loop that is natural-language controlled yet risk-constrained.

1.5.3 Auditable Personal Knowledge and Memory System

For researchers or users who need long-term context accumulation.

Scenario Description: Leveraging OpenClaw's "File as Truth" design, all interaction memories and extracted long-term knowledge are stored as standard Markdown or JSONL files. A local SQLite database builds vector indexes to enable Retrieval-Augmented Generation (RAG).

Specific Workflow:

  1. A researcher discusses an academic topic across multiple Sessions; every record is automatically saved as a JSONL session log.
  2. OpenClaw periodically (via Cron) triggers a knowledge extraction agent that reads logs from the past week and uses the Compaction mechanism to summarize long dialogues into high-density structured notes.
  3. Once notes are stored as Markdown files, the Agent Runtime uses Embedding tools to build vector indexes for the notes, supporting subsequent semantic searches.
  4. The user can query: "Show the evolution of my thoughts on Topic X over the past month." The system retrieves relevant notes via vector index and displays the full version history (via git diff).
  5. OpenClaw Features Used: Session containers, Compaction (context compression), Local SQLite + Vector Index, Cron jobs, Embedding tools, File storage (Markdown/JSONL), and Hook-triggered automatic extraction.
  6. Advantage: Ensures absolute data privacy. The plaintext format makes the knowledge base "permanently portable, version-backable, and history-comparable (via git diff)."

1.5.4 Security Research and Agent Sandbox Experimental Library

For security defense researchers.

Scenario Description: Since OpenClaw explicitly lists LLM-specific threats (e.g., Prompt Injection leading to arbitrary command operation, SSRF, or approval bypasses) as research points, it is naturally suited as an experimental platform in isolated virtual environments.

Specific Workflow:

  1. Researchers deploy OpenClaw in an isolated VM, configuring a simulated corporate scenario (e.g., a finance agent with access to databases and email).
  2. Researchers attempt to induce unauthorized operations via malicious prompts (Prompt Injection) or forged tool return values to bypass Guardrails.
  3. The system tests the attack through several defense layers: (a) Instruction and Credential Isolation to check for leaks; (b) Trust Chain verification to ensure return values come from trusted sources; (c) Exec Approval gates to intercept high-risk requests.
  4. Researchers analyze logs and state machine transitions for each layer to verify the effectiveness of protections and report vulnerabilities.
  5. OpenClaw Features Used: Sandbox isolation, Guardrail security barriers, Trust Chain, Exec Approval mechanisms, structured logging and audit trails, and the Agent Loop state machine.
  6. Advantage: Assists researchers in verifying platform-level protections (such as "Role-based Sandbox Control") in a controlled environment, serving as a sandbox rehearsal for deploying larger-scale enterprise AI infrastructure.

1.6 Chapter Summary

This chapter has outlined OpenClaw’s system positioning, architectural landscape, and core objects.

Key Takeaways

  • The core value of an Agent lies in the end-to-end task closed-loop, rather than the surface-level effects of a single-turn dialogue.
  • The architectural backbone of OpenClaw consists of the Gateway control plane and the Agent Runtime operation kernel; Channels, Tools, and Memory serve as independently replaceable capability layers.
  • The five core objects—Gateway, Agent, Node, Tool, and Session—run through the entire book, determining the entry points for configuration, troubleshooting, and expansion.

Reader Self-Check

After reading this chapter, try to answer the following questions:

  • Can you define Gateway, Agent, Tool, and Session in one sentence each and explain what they are responsible for? (You may refer to the Glossary in Appendix A to verify your understanding.)
  • When encountering issues like "tool not triggered" or "missing result reinjection," can you pinpoint the fault to one of the five core objects?
  • Does your current learning objective fall into the "Available → Controllable → Extensible" stage? Is the corresponding acceptance criteria clear?

Chapter 2: Environment Preparation and Deployment

This chapter aims to guide you through deploying OpenClaw in a local or server environment. By following a systematic preparation process, complete installation workflow, and acceptance testing, you will ensure that OpenClaw runs reliably in your environment, laying the foundation for the practical exercises in subsequent chapters.


Chapter Navigation

This chapter consists of the following sections:

  • 2.1 System Requirements & Pre-flight Checks: Defining Node.js versions, network connectivity requirements, and account/API key prerequisites.
  • 2.2 Installing OpenClaw: Mastering the installation process via official scripts and npm, while establishing awareness of version control and rollback procedures.
  • 2.3 Initialization Wizard & Initial Configuration: Understanding the core output of the openclaw onboard wizard and gaining a preliminary understanding of the generated base configuration files.
  • 2.4 Daemon Processes & Availability Acceptance: Implementing and mastering the standard "First-Run Verification System" and basic troubleshooting guidelines.
  • 2.5 Chapter Summary: A review of key takeaways and suggestions for next steps.

Learning Objectives

After completing this chapter, you will be able to:

  1. Prepare the Environment: Verify that your local environment meets the requirements for running OpenClaw.
  2. Smooth Installation: Complete the full installation and initialization process using official tools.
  3. Verify Availability: Confirm the system is operating correctly using standard diagnostic commands.
  4. Establish a Baseline: Build a solid foundation for the hands-on applications in later chapters.
Scope of ApplicationThis guide is applicable to macOS, Linux, and Windows (WSL2 recommended) environments. For production-grade deployments, it is strongly recommended to use a Linux host with Docker, supplemented by a reverse proxy, process manager, and a strict least-privilege account policy.


2.1 System Requirements & Pre-flight Checks

This section outlines the system environment and network connectivity requirements that must be verified before installation.

2.1.1 Core Preparation Checklist

Key dependencies to prepare:

  1. An Internet-connected Computer: Supports macOS, most Linux distributions, or Windows WSL2. Any machine from the last 5 years should suffice. A cloud VM is the most hassle-free option, provided it has internet access.
  2. An API Key (Model Service Secret): Obtain an API key from any Large Language Model (LLM) provider (e.g., OpenAI, Anthropic, DeepSeek, etc.). This is critical for giving the system reasoning and generation capabilities.

2.1.2 Hard Runtime Requirements

Regardless of the installation method, the purity and version compatibility of the underlying runtime environment is the first hurdle.

  • Operating System: Supports macOS, most Linux distributions (e.g., Ubuntu 24.04 LTS), and Windows WSL2 (Ubuntu subsystem preferred). Native Windows CMD/PowerShell often encounters file path and permission issues; it is not recommended for production.
  • Node.js Environment: Requires Node.js 22 or newer. Check via node --version. Versions that are too old will cause dependency and runtime capability incompatibilities.
  • Package Manager (Optional): If you choose a native Node installation instead of the recommended one-click script, you will need npm.
  • Hardware Specs: Minimum 4 GB RAM; 8 GB or more is recommended if you plan to run browser tools.
[!WARNING]Insufficient memory causes numerous issues. When performing auto-updates, running browsers, or processing long-context tasks, servers often encounter OOM (Out of Memory) errors that freeze processes or cause update failures. The cost saved on hardware is rarely worth the subsequent troubleshooting overhead.

2.1.3 Network Connectivity Verification

Since OpenClaw relies heavily on remote APIs, network status directly dictates availability. This is split into two parts:

  1. Installation Phase: Ensure the machine can pull from the npm registry or docker.io. Configure HTTP proxies or mirror sources if necessary.
  2. Runtime Phase: Ensure connectivity to your chosen LLM provider's endpoints. To use overseas channels like Telegram Bot, the machine must have unobstructed access to those platforms.
  3. Perform these two connectivity checks in your terminal:
  4. Bash

# Installation Network: npm registry

curl -sS -m 5 -o /dev/null -w "npm registry: %{http_code}\n" https://registry.npmjs.org/


# Runtime Network: LLM Provider API (OpenAI example)

curl -sS -m 10 -o /dev/null -w "llm provider: %{http_code}\n" https://api.openai.com/v1/models \

 -H "Authorization: Bearer $OPENAI_API_KEY"# Notes:# - 200: Authentication passed and network is reachable.# - 401/403: Usually means authentication failed, but the network path is open (still useful for "can I connect" checks).


2.1.4 Permissions & System Time

Often overlooked but fatal hidden issues include:

  1. System Time Drift: Many cloud provider APIs use JWT-based authentication with strict expiry checks. If server time drifts by more than 5 minutes, you may encounter unexpected "Permission Denied" errors. Enable a time sync service (chrony, systemd-timesyncd, or ntpd).
  2. User Permissions: OpenClaw is not a kernel-level driver and never requires root permissions. Create a dedicated standard user account with read/write access only to the necessary working directories.

2.1.5 Account & Key Readiness

To ensure a "plug-and-play" experience, gather these materials in a password manager or environment variables beforehand. Do not hardcode them.

  • Core Material: At least one LLM API key.
  • Channel Material: Telegram Bot Token (from @BotFather) or other credentials for your initial channel (e.g., Feishu/Lark App Secret).
[!CAUTION]Never hardcode API tokens in test scripts or copy-paste them into chat software. Use environment variables for secure management.

2.1.6 Environment Check Script Example

You can use the following diagnostic script check_env.sh to verify core dependencies:

Bash

#!/bin/bashecho "=== OpenClaw Environment Self-Check ==="

node --version || echo "Warning: Node.js not installed or version < 22"

npm --version || echo "Tip: npm not installed. Required for non-script installations."

docker --version || echo "Tip: Docker not installed (Required for container deployment)."echo "Testing installation network (Official Script)..."

curl -s -m 5 -o /dev/null -w "install script: %{http_code}\n" https://openclaw.ai/install.sh


echo "Testing runtime network (LLM Provider, e.g., OpenAI)..."if [ -n "${OPENAI_API_KEY:-}" ]; then

 curl -sS -m 10 -o /dev/null -w "llm provider: %{http_code}\n" https://api.openai.com/v1/models \

   -H "Authorization: Bearer $OPENAI_API_KEY"else

 curl -sS -m 10 -o /dev/null -w "llm provider: %{http_code}\n" https://api.openai.com/v1/models

fiecho "Note: 200 = Success; 401/403 = Reachable but check Key/Permissions."echo "Check Complete"

Expected Output (Healthy Environment):

Plaintext

=== OpenClaw Environment Self-Check ===

v22.12.0

10.2.3

Docker version 27.3.1

Testing installation network...

install script: 200

Testing runtime network...

llm provider: 200

Check Complete

Common Anomalies:

OutputMeaningAction
Warning: Node.js not installedMissing Node.js or not in PATHRun nvm install 22 or install from official site
install script: 000Cannot connect to openclaw.aiCheck network/proxy/DNS settings
llm provider: 401Invalid API Key or not setCheck $OPENAI_API_KEY environment variable
llm provider: 403API Key lacks permissionVerify account has available credit/quota

2.2 Installing OpenClaw

This section describes how to install OpenClaw in your chosen environment. The official recommendation is to use the one-click installation script for the best experience, though installation via package managers like npm is also supported.

2.2.1 Official Recommendation: One-Click Script

The simplest and fastest way to install is by executing the official one-click script. This script automatically handles dependencies and installs the latest version of the OpenClaw CLI.

macOS / Linux Run the following command in your terminal:

Bash

curl -fsSL https://openclaw.ai/install.sh | bash

Windows (PowerShell) Run the following command in PowerShell:

PowerShell

iwr -useb https://openclaw.ai/install.ps1 | iex


2.2.2 Alternative: Installation via npm

If you are familiar with the Node ecosystem or require precise version control for specific workflows, you can install OpenClaw globally using npm or pnpm.

Bash

# Global installation via npm

npm install -g openclaw@latest


# Global installation via pnpm

pnpm add -g openclaw@latest

Suggestion: Avoid relying on the latest tag indefinitely in testing or production environments. A safer practice is to lock to a specific version and include that version number in your delivery documentation and regression checklists.

Bash

npm install -g openclaw@<version>


2.2.3 Docker Installation

Ideal for containerized or headless deployment scenarios.

  • Prerequisites: Docker Desktop or Docker Engine (including Docker Compose v2), and at least 2 GB of RAM.
  • Quick Install (Recommended): Run the automation script in the repository root to automatically build the image, run the onboarding wizard, start the gateway, and generate a Token:
  • Bash

./docker-setup.sh

You can customize behavior via environment variables, such as enabling the sandbox or pre-installing extensions:

Bash

export OPENCLAW_SANDBOX=1

export OPENCLAW_EXTENSIONS="diagnostics-otel matrix"

./docker-setup.sh

Manual Installation: If not using the automation script, execute the following commands in sequence:

Bash

docker build -t openclaw:local -f Dockerfile .

docker compose run --rm openclaw-cli onboard

docker compose up -d openclaw-gateway

Post-Installation Verification: Access http://127.0.0.1:18789/ in your browser. Retrieve the Token from your .env file and paste it into the console Settings. You can confirm the gateway status via the health check endpoint:

Bash

curl -fsS http://127.0.0.1:18789/healthz


2.2.4 Building from Source

Best for developers or scenarios requiring custom modifications. Requires Node.js >= 22 and pnpm.

Bash

git clone https://github.com/openclaw/openclaw.git

cd openclaw

pnpm install

pnpm openclaw setup

To start the gateway:

Bash

node openclaw.mjs gateway --port 18789 --verbose

For development mode (hot reload):

Bash

pnpm gateway:watch


2.2.5 Other Installation Methods

Official support is also provided for the following methods in specific O&M scenarios:

  • Podman: A rootless container solution via ./setup-podman.sh, supporting --quadlet for systemd integration.
  • Nix: Declarative installation using the nix-openclaw Home Manager module, supporting version locking and instant rollbacks.
  • Ansible: Automated cluster deployment for Debian/Ubuntu servers; the Playbook automatically configures VPNs, firewalls, Docker, Node.js, and systemd services.

2.2.6 Environment Variables & Path Overrides

OpenClaw provides environment variables to override default paths, which is particularly useful for multi-instance or non-standard deployments:

  • OPENCLAW_HOME: Sets the primary directory for internal path resolution.
  • OPENCLAW_STATE_DIR: Overrides the directory for mutable state storage.
  • OPENCLAW_CONFIG_PATH: Overrides the configuration file path.

2.2.7 Verifying the Installation

After installation, perform a minimal verification to ensure the command is available and your PATH is correct:

Bash

openclaw --version

openclaw --help

2.2.8 Version Upgrades & Governance

To upgrade to a new version, re-run the one-click script or re-execute the global installation command with the desired <version> tag (or @latest).

Bash

npm install -g openclaw@<version>

The goal of an upgrade strategy is not just to use the newest version, but to ensure the process is verifiable and reversible:

  • Regress before committing: After every upgrade, verify at least the health, status, channel probes, and model probes.
  • Roll back first if issues arise: If the new version is unstable, immediately re-install the previous stable version (e.g., npm install -g openclaw@<old-version>) before diagnosing differences.
💡 Troubleshooting Tale: The Node Version MysteryA community user reported that openclaw installed successfully but threw constant SyntaxError: Unexpected token errors upon startup. After three hours of troubleshooting, it was discovered that the system's default Node version was v16 (a legacy setting in nvm), while OpenClaw requires Node 22+. Lesson: Always run node -v to confirm your version before installation, especially in environments using version managers like nvm or Volta.

2.3 Initialization Wizard & Initial Configuration

This section guides you through generating a minimal configuration using the official onboarding wizard and completing your first interaction verification.


2.3.1 Running the Onboarding Wizard

Execute the following command to start the wizard. It is officially recommended to use the --install-daemon flag. This not only generates your workspace configuration but also automatically installs background daemon services (such as LaunchAgent for macOS or systemd user services for Linux/WSL2).

Bash

openclaw onboard --install-daemon

Tip: The primary difference between using and omitting the --install-daemon flag lies in the automatic configuration of background services:With the flag (Recommended): In addition to generating workspace configurations (e.g., ~/.openclaw/workspace), it registers and installs system background services. This ensures the Gateway service continues running after a reboot, making it ideal for long-term use.Without the flag (Running only openclaw onboard): Only generates configuration files and completes initialization without registering background services. You may need to start the service manually whenever you wish to use it—better for temporary local trials.

2.3.2 Configuration Guide & Pitfall Prevention

During the wizard’s series of prompts, follow this "Golden Path" to minimize troubleshooting stress:

  1. Onboarding mode: Select QuickStart. This gets you running quickly with a minimal configuration, binding to 127.0.0.1:18789 by default with Tailscale exposure disabled.
  2. Model/Auth: Bind your Anthropic or OpenAI API Key. If using other compatible providers, follow the prompts to enter the details.
  3. Search provider: Used to give the Agent web-search capabilities (for retrieving documents, news, etc.). Tavily is a highly recommended free option (requires registration on their website for an API Key). You can also choose Google Custom Search or Bing. While you can Skip for now, it will limit the Agent’s ability to perform tasks requiring real-time verification.
  4. Skills: These are base capability packages (e.g., github for version control, gog for game distribution, or the common clawhub). To ensure a clean environment, strongly recommend choosing Skip for now for the first run. Once the basic link is verified, you can add specific skills via the Dashboard.
  5. Workspace: Defaults to ~/.openclaw/workspace. It is recommended to keep this default.
  6. Channels: Select Skip for now. This is the secret to reducing initial frustration. Get the built-in Dashboard working first to confirm the LLM and environment are healthy before configuring channels like Lark or WhatsApp.
  7. The wizard ends with a Health check and an automatic initial start of the Gateway.
Note: If you did not use the --install-daemon flag, you must manually execute openclaw gateway to start the service after closing your terminal or restarting your computer.

Once the Gateway is running, the core goal is to complete the initialization dialogue (Bootstrap) via the built-in Control UI (Dashboard).

Run the following command to open the local console directly, or visit http://127.0.0.1:18789/#token=<TOKEN> in your browser:

Bash

openclaw dashboard

Once opened, you will see the Gateway Dashboard interface, structured as shown in the overview below:

Figure 2-1: Dashboard Overview

In the dialogue box, set boundaries for your Agent by defining:

  1. Identity: Describe your role/title.
  2. Agent's Role: e.g., "My daily document processing assistant."
  3. Smoke Test: Issue a request that is immediately verifiable without private domain knowledge.
  4. Example: A Successful Initialization Dialogue
"Hello! I am a busy office worker who often forgets things. Please act as my daily productivity assistant. Follow these rules: 1. Give practical, down-to-earth advice. 2. Keep answers brief and structured. 3. Explain technical terms in plain language. As a smoke test, give me a Markdown list of 5 daily to-do items I can execute today, sorted by priority."

If you receive a structured, practical response, congratulations! Your base installation is successful.


2.3.4 Workspace Bootstrap Files at a Glance

After the first dialogue, check ~/.openclaw/workspace. You will find a set of Markdown files generated from templates. These form the Agent's Bootstrap Context: every time a session starts, the Gateway injects these into the system prompt so the Agent immediately knows who it is and what to do.

Plaintext

~/.openclaw/workspace/

├── AGENTS.md        # Workspace Home: Startup checklist & red lines

├── SOUL.md          # Persona: Values, communication style, boundaries

├── USER.md          # User Profile: Name, timezone, preferences

├── IDENTITY.md      # Agent Metadata: Name, avatar, emoji

├── TOOLS.md         # Environment Notes: Local device names, SSH hosts

├── HEARTBEAT.md     # Heartbeat Inspection List (Optional)

├── BOOTSTRAP.md     # Onboarding script (Auto-deleted after completion)

└── memory/          # Memory directory (Daily conversation summaries)

File Responsibilities & Loading Logic

FileOne-Sentence DefinitionWhen it's Read
AGENTS.mdThe "Home Page." Defines file reading order and rules for group chats.Every Session
SOUL.mdThe "Character Manual." Defines pragmatic values and communication style.Every Session
USER.mdUser Profile. Records your background and preferences, evolving over time.Every Session
IDENTITY.mdAgent Metadata. Stores the Agent's name and representative emoji.Every Session
TOOLS.mdEnvironment Memo. Records local device names and SSH aliases.Every Session
HEARTBEAT.mdHeartbeat Task List. Executed during periodic polls.Heartbeat Only
BOOTSTRAP.mdOnboarding Script. Guides the initial self-introduction.First Run Only


[!TIP]These are standard Markdown files. You can edit them anytime. If you modify SOUL.md or AGENTS.md, the changes take effect in the next session—no Gateway restart required.

2.3.5 First-Run Acceptance Criteria

Verify your setup using the built-in diagnostic tools:

Bash

# Check Gateway status

openclaw gateway status


# Perform a full configuration health check

openclaw doctor

Once diagnostics pass, your "Model and Control Link" is established. The next section will cover monitoring, log troubleshooting, and deep availability verification for the background Gateway.

2.4 Daemon Processes & Availability Acceptance

After initializing with openclaw onboard --install-daemon, the OpenClaw Gateway is automatically configured as a system-level daemon that starts on boot (such as a LaunchAgent on macOS or a systemd user service on Linux). This section explains how to verify and manage this background service process.

2.4.1 Pre-launch Checks & Foreground Trial Run

If you have not installed the daemon, or if you need to perform temporary debugging, you can run the Gateway directly in the foreground:

Bash

# Start Gateway in the foreground with real-time log output to the console

openclaw gateway --port 18789

Configuration & Health Check: Exposing Errors Early

Bash

openclaw doctor

If the health check indicates that configuration files failed to load or contain syntax errors, prioritize checking your config paths and formatting. Environment variables like OPENCLAW_HOME and OPENCLAW_CONFIG_PATH can also be used to override default paths.


2.4.2 Background Service Management & Status

When deployed via the --install-daemon mode, OpenClaw can manage its background status directly through its own CLI tools, eliminating the need for third-party process managers like pm2.

Bash

# View the current background status and critical ports of the Gateway

openclaw gateway status

If the process status appears abnormal, check the corresponding background logs for your specific platform.

Tip: Service names registered automatically may vary by environment. On Linux, this is typically sudo systemctl status openclaw or systemctl --user status openclaw. However, using the CLI wrapper commands is the most platform-independent and universal method.

Once the service is confirmed to be running, further verify the Gateway’s control plane operation:

Bash

openclaw logs --limit 200


# Optional: Follow structured logs in real-time

openclaw logs --follow --json

If logs show repeated restarts or authentication failures, stop and perform a layered troubleshooting analysis.


2.4.3 Minimal Availability Verification: Testing Communication

The fastest way to verify Gateway functionality is to send a test message directly to a supported channel via the CLI:

Bash

# Replace target with an actual reachable account ID (e.g., a bound WhatsApp number)

openclaw message send --target +15555550123 --message "Hello from OpenClaw Dashboard"

Success here proves that the path from process loading to route invocation is unobstructed. You can then proceed to visualized dialogues via the Control UI.


2.4.4 Layered Troubleshooting Architecture Overview

When a first run fails, use the following four-layer "near-to-far" path for rapid localization:

flowchart TD

start["Gateway Abnormal"] --> L1["Environment & Config Layer"]

L1 -->|"Port occupied? Config error?"| L1ok{Pass?}

L1ok -->|"No"| fix1["Fix file paths & port conflicts"]

L1ok -->|"Yes"| L2["Control Plane Layer"]

L2 -->|"Permission error? Model Key invalid?"| L2ok{Pass?}

L2ok -->|"No"| fix2["Update Auth via Dashboard"]

L2ok -->|"Yes"| L3["Operation Link Layer"]

L3 -->|"Plugin crash? Tool timeout?"| L3ok{Pass?}

L3ok -->|"No"| fix3["Audit tool and skill parameters"]

L3ok -->|"Yes"| L4["External Network Layer"]

L4 --> fix4["Troubleshoot API firewalls & rate limits"]

2.5 Chapter Summary

Upon completing this chapter, you should possess a standardized workflow for installation, initialization, and first-run acceptance, as well as the ability to quickly localize common environment and dependency issues.

2.5.1 Key Takeaways

  • Checklist-driven Prerequisites: Ensure Node.js LTS, network connectivity, time synchronization, and credentials are fully prepared before starting.
  • Rollback-ready Installation: Use package managers for trials/development, but prioritize containerization with fixed versions for production.
  • Interpretable Wizard Outputs: Configuration locations, workspace structures, secret injections, and least-privilege policies should all be fully traceable.
  • Binary Acceptance Criteria: Validation is only complete once health/status passes, channel/model probes succeed, and persistence is verified after a reboot.

2.5.2 Reader Self-Check

  • [ ] Can you consistently reproduce a functional runtime environment based on the checklist?
  • [ ] Are sensitive credentials injected only via environment variables or secret management systems—ensuring they are never stored in databases, baked into images, or disclosed in troubleshooting logs?
  • [ ] Have you recorded the version number, configuration summary (de-identified), and acceptance results to serve as a baseline for future upgrades or rollbacks?

2.5.3 Coming Up Next

Chapter 3 will dive deep into configuring initial agent instructions and group chat strategies within the Dashboard and WebChat environments. Advanced multi-channel integration will be covered in Chapter 7.


Chapter 3: Quick Start & First-Round Conversation Practice

This chapter establishes a reproducible "Minimum Viable Baseline": first, we will run the main local loop using the Dashboard and WebChat to master diagnostic commands and troubleshooting sequences; next, we will solidify initial instruction goals and formatting; finally, we will set up minimum security and access boundaries for entry points. By the end of this chapter, you will have your first functional OpenClaw instance and a solid foundation for further in-depth learning.

Chapter Guide

This chapter consists of the following sections:

  • 3.1 Getting Started with Dashboard & WebChat: Establish a repeatable local conversation baseline using the Dashboard and WebChat, and learn to interpret each step through logs.
  • 3.2 Common Diagnostic Commands & Troubleshooting: Formulate a fixed diagnostic sequence and chain of evidence to avoid "configuration/prompt changes based on intuition."
  • 3.3 Initial Instructions & Agent Persona Configuration: Write "executable initial instructions" focusing on goal convergence, boundary declarations, and output structure constraints.
  • 3.4 Pairing, Group Chat Strategies & Access Boundaries: Establish a minimum trust boundary for entry points: pairing approvals, group chat gating, and the linkage between allowlists and least privilege.
  • 3.5 Summary: Key takeaways and self-assessment exercises.

Learning Objectives

Upon completing this chapter, you will be able to:

  1. Rapid Verification: Run a repeatable conversation baseline using the Dashboard and WebChat.
  2. Self-Diagnosis: Master common diagnostic commands and troubleshooting workflows.
  3. Instruction Design: Design clear, executable initial instructions and persona definitions.
  4. Establish Boundaries: Implement the principle of least privilege to set up access control for the system.

Prerequisites

  • Completion of Chapter 2 installation and initial run verification.
  • Ability to access the Dashboard/WebChat via a local browser (port forwarding required for remote environments).

3.1 Quick Start with Dashboard & WebChat

This section establishes a "minimum closed loop" based on the official Web interface: first, confirm gateway health via CLI, then use the Dashboard to open WebChat for a minimal interaction test. We will also include common blockers, such as "new device approval," in our troubleshooting path. The goal is to ensure a stable, reproducible local baseline exists before connecting any external channels.

3.1.1 Why Start with Dashboard and WebChat?

Connecting external channels introduces numerous variables: platform rate limiting, callback retries, network jitter, and group chat noise. Verifying the main loop via the local Dashboard and WebChat offers two immediate benefits:

  • Problem Layering: If the local WebChat works, external channel failures are likely due to channel configuration or platform-side issues.
  • Chain of Evidence: Local interactions are easier to reproduce, with more complete logs and traces.
  • The official recommended desktop entry point is the Dashboard, typically running on a local address such as http://127.0.0.1:18789/.

3.1.2 Opening the Dashboard: Ports, Entry, and Common Blockers

The following operational steps are recommended.

3.1.2a Dashboard Functional Areas Overview

The Dashboard (Control UI) serves as the Web management center for OpenClaw. The interface consists of a Top Bar, Sidebar Navigation, and Main Content Area.

Top Bar: The left side features a hamburger menu (≡, to collapse/expand the sidebar) and the OpenClaw logo. The right side displays the version number (e.g., Version 2026.3.8), a health indicator (Health OK), and a theme switcher (System/Light/Dark).

Navigation Bar: Contains the following menu items, divided into two groups. The first group (visible by default) includes 10 items:

Menu ItemRouteFunctional Description
Chat/chatWebChat window supporting session selection, new chats, and streaming output. Features a "Thinking" toggle, Focus Mode, and Cron Sessions viewer.
Overview/overviewGateway overview page. Shows Access info (WebSocket URL, Token, Session Key) and Snapshot cards (Status, Uptime, Cron status).
Channels/channelsChannel management. Displays status (Running, Mode, Last probe), Account configs, Allowlists, and credential settings (Bot Tokens).
Instances/instancesInstance list. Shows presence beacons for connected gateways and clients, including Hostname, IP, OS, Version, and Permissions.
Sessions/sessionsSession management. Lists active sessions with Key, Label, Kind, Token usage, and per-session overrides for Thinking/Verbose modes.
Usage/usageStatistics and cost analysis. Supports date filtering (7d/30d), Token/Cost views, Activity timelines, and data export.
Cron Jobs/cronScheduled task management. Features global status, a creation form (Schedule, Operation mode, Wake mode), and an existing jobs list.
Agents/agentsAgent configuration center. Left: Agent list; Right: Details panel including Overview, Files (edit AGENTS.md/SOUL.md), Tools, and Skills.
Skills/skillsSkill management. Lists built-in skills (bundled/blocked), supports search/filter, and allows for dependency installation (e.g., 1Password CLI).
Nodes/nodesDevice and permission management. Configures Exec Approval policies (Security/Ask Mode) and manages paired Device IDs/Tokens.

The second group (appearing in the expanded area or via direct routing) includes 4 items:

Menu ItemRouteFunctional Description
Config/configGlobal config editor (openclaw.json). Supports Form/Raw editing modes with search, tagging, and Save/Apply operations.
Debug/debugDebug snapshots. Displays raw JSON of internal gateway states (heartbeat, channelSummary, queued events) for deep troubleshooting.
Logs/logsReal-time log viewer. Reads JSONL logs with level filtering (trace to fatal), keyword search, auto-follow, and export functionality.
DocsExternalDirect link to the official OpenClaw documentation site (docs.openclaw.ai).

In a troubleshooting workflow, it is recommended to confirm gateway health in Overview, check external connections in Channels, and finally perform interaction tests in Chat. For deeper diagnostics, refer to Logs and Debug.

  1. Confirm service health first.

Bash

openclaw health --json

2. Open the Dashboard.

Bash

openclaw dashboard


# Or open directly on the gateway machine:# http://127.0.0.1:18789/

Common Blocker: First-time access from a new browser or device requires approval. If the Dashboard indicates a pending device, list and approve it via the CLI:

Bash

openclaw devices list

openclaw devices approve <ID>


3.1.3 WebChat Interaction & Streaming: Visual Processes for Easier Troubleshooting

The key value of WebChat is exposing the process: whether the model request was sent, if tools were proposed/executed, and if output is streaming back. For troubleshooting, the most important task is aligning each interaction with the traces in the logs.

Operational Suggestion: Enable structured logs and compare the streaming output in the Dashboard's Chat interface.

Figure 3-1: WebChat interface illustration. The Chat page provides a complete view: the left side for input, and the right or bottom for streaming output, including user input, model reasoning, tool call requests, and results.

Bash

openclaw logs --follow --json


3.1.4 Minimum Closed-Loop Test Cases

It is recommended to use reproducible test cases rather than random questions.

Test Case 1: Health Link Confirmation

  • Action: Run health, then open the Dashboard.
  • Expected Output:
  • JSON

{

 "status": "ok",

 "gateway": "running",

 "uptime": 12345,

 "channels": { "telegram": "connected" },

 "models": { "default": "gpt-5" }

}

  • Result: Dashboard loads successfully with no "pending device" prompts.
  • Test Case 2: Minimal Interaction
  • Action: Enter Please output only one JSON: {"pong": true} in WebChat.
  • Expected Output: {"pong": true}
  • Log Snippet (Structured):
  • JSON

{ "level": "info", "event": "request_received", "message": "..." }

{ "level": "info", "event": "response_sent", "duration_ms": 2333 }

est Case 3: Streaming Verification

  • Action: Enter Provide a 5-step troubleshooting plan, max 20 words per step.
  • Expected Output (Streaming):Confirm service health ✓Check channel status ✓Query recent log errors ✓Verify model quota ✓Isolate root cause ✓
  • Log Expectations:Observe five chunk_sent events, one for each step.If latency > 5s, check logs for tool_call_pending or model_waiting.If frozen, check for error or timeout events.

3.2 Common Diagnostic Commands & Troubleshooting

This section shifts troubleshooting from "guessing based on errors" to "layered positioning based on evidence." The core strategy is to use health and status to determine system availability, followed by channels status --probe and models status --check to verify dependencies, and finally using doctor and diagnostic configurations to sample evidence while redacting sensitive information.

All CLI commands in this section can be found in Appendix E: Command Cheat Sheet for full syntax; for a more systematic process, see Appendix C: Troubleshooting Checklist.

3.2.1 Layered Positioning: Outside-In

We recommend a fixed troubleshooting sequence:

  1. Process & Health: Is the service available?
  2. Gateway & Channels: Are channels connected? Are they being blocked by policies?
  3. Models & Quota: Is the model accessible? Is it rate-limited or failing authentication?
  4. Sessions & Tools: Is there context crosstalk? Is it blocked by tool policies?
  5. Command Ladder Flowchart:

3.2.2 Four Commands Covering 80% of Issues

Run health and status probes first, then verify channel and model dependencies to narrow down issues to actionable steps.

Command 1: Health Check

Bash

openclaw health --json

Normal Output (includes fallback models and timestamps):

JSON

{

 "status": "ok",

 "gateway": "running",

 "uptime": 45678,

 "channels": { "telegram": "connected", "whatsapp": "connected" },

 "models": { "default": "gpt-5", "fallback": "claude-opus-4-6" },

 "last_check": "2026-03-06T10:30:45.123Z"

}

Common Abnormalities: status: "degraded" indicates partial failure. Check the errors array for expired tokens or quota warnings.


Command 2: Status Overview & Deep Probe

Bash

openclaw status --deep

Normal Output: Shows PID, Uptime, Memory/CPU usage, and a summary of active chats per channel.

Common Abnormalities: High memory usage (>90%), disconnected channels, or LIMIT REACHED on model quotas.


Command 3: Channel Status & Connectivity Probe

Bash

openclaw channels status --probe

Normal Output: Provides webhook latency and message delivery confirmation.

Common Abnormalities: bot_token_invalid or high webhook_latency (>5000ms), suggesting network issues or firewall blocks


Command 4: Model Status & Auth Probe

Bash

openclaw models status --check

Normal Output: Shows provider availability, token usage percentage, and latency.

Common Abnormalities: authentication_failed (invalid API key) or rate_limited (100% quota used).


3.2.3 Logs & Storage: Finding the Evidence

Logs are written to /tmp/openclaw/openclaw-YYYY-MM-DD.log by default. Use --json and jq for filtering. You can also view these in real-time via the Logs tab in the Dashboard.

Bash

openclaw logs --follow --json

3.2.4 Doctor & Diagnostic Config: Repair and Redaction

The doctor command fixes common issues, while the config controls log rotation and data masking.

Bash

openclaw doctor --fix

Configuration Example (Redaction & Logging):

JavaScript

{

 "logging": {

   "level": "info",

   "redactSensitive": "tools",

   "redactPatterns": ["sk-[A-Za-z0-9]{16,}"]

 },

 "diagnostics": {

   "enabled": true,

   "flags": ["telegram.*"]

 }

}

[!WARNING] OpenClaw enforces strict Schema validation. Unknown keys will cause the Gateway to reject startup; use openclaw doctor to restore configuration.

3.2.5 Typical Scenario: Rapid Assessment of Unresponsive Channels

  • Health fails: Check process/ports, then config syntax and permissions.
  • Channel probe fails: Check credentials and login status (e.g., QR code expiry).
  • Model probe fails: Check auth, quotas, and provider status.
  • Don't change prompts or workflows first; verify dependencies and the evidence chain.

3.2.6 Slash Command Cheat Sheet: High-Frequency Operations

These commands can be sent directly in the chat interface:

CommandPurposeTypical Scenario
/statusView current running statusUse this first if the agent is stuck.
/stopStop the current taskForce-terminate a stuck tool call.
/compactCompress session contextSave tokens when near the limit.
/newStart a fresh sessionAvoid context pollution when switching topics.
/model <name>Switch current modelSwap to a cheaper or more powerful model.
/think <level>Adjust reasoning depthUse off for chat, high for complex logic.
[!TIP] /status + /stop is your first line of defense. If the agent is unresponsive but /status shows it is running, a tool call is likely hanging—use /stop to recover.💡 Real-World Note: Health "OK" but no messagesopenclaw health --json might return ok while WhatsApp fails because the QR pairing expired. The process is alive, but the socket is inactive. Always use channels status --probe for true end-to-end verification.

3.3 Initial Instructions & Agent Persona Configuration

Initial instructions are designed to solidify an agent's core objectives, operation boundaries, and output standards. They provide a stable semantic foundation for subsequent tool calls and routing decisions. Based on OpenClaw's instructions configuration, this section details best practices for writing instructions—including what to include and what to avoid—and explores how to build instructions, channel entry points, tool policies, and system observability into a repeatable workflow.

3.3.1 Role of Initial Instructions: Goal Convergence, Boundaries, and Formatting

In engineering terms, initial instructions are not about "creating a persona" but are a runtime contract that includes at least three types of information:

  1. Goal Convergence: What specific problems does this agent solve? Which queries should be rejected or handed off?
  2. Boundary Declaration: Explicitly forbidden behaviors and information sources to prevent unauthorized actions or hallucinations.
  3. Formatting Constraints: Agreed-upon output structures, such as mandatory Markdown, executable commands, or failure handling procedures.
  4. The more instructions resemble an "executable specification," the more stable the system becomes. Literary or vague tones often introduce ambiguity, making the model unreliable in following instructions during critical moments.

3.3.2 Setting Default and Specific Instructions in Config

OpenClaw supports global default instructions as well as specific overrides for individual agents. The configuration uses agents.defaults.instructions as the base value, which can be customized per agent.

The following example demonstrates basic instruction patterns. These settings can also be managed visually via the Agents page in the Dashboard:

Figure 3-4: Visual configuration for Agents.

The example below emphasizes "verifiable and troubleshootable" output requirements:

JavaScript

{

 agents: {

   defaults: {

     instructions: "Produce executable steps and verification methods; state uncertainties clearly and provide troubleshooting paths; avoid fabricating commands or config keys."

   },

   work: {

     displayName: "Work Assistant",

     instructions: "Handle work-related queries only; for write-access or high-risk operations, provide confirmation points and rollback plans first."

   }

 }

}

If using multi-agent routing, it is recommended to place "entry governance" in the entry agent's instructions and "domain knowledge/tool usage" in the domain agent's instructions to avoid overloading a single set of instructions.


3.3.3 Writing Methods: Turning Constraints into Checkable Clauses

Instruction executability comes from being "checkable." In practice, use the following structure:

  1. Define Output Structure First: e.g., fix it to "Conclusion, Command, Verification, Failure Handling."
  2. List Prohibitions: e.g., forbid outputting commands or keys not present in the documentation.
  3. Define Escalation Paths: When high-risk tools are needed, prompt for elevated entry or manual confirmation.
  4. Comparison: Good vs. Bad Instructions
Dimension❌ Bad (Literary/Vague)✅ Good (Checkable/Verifiable)
Goal"You are a friendly assistant helping the user.""Only handle K8s ops queries; reply 'Out of scope' for others."
Boundary"Please use dangerous commands carefully.""Forbidden: kubectl delete, helm uninstall. If required, output a rollback plan and wait for confirmation."
Output"Please answer as detailed as possible.""Output must include: Conclusion (1 sentence), Command (commented), Verification (expected output), Failure Handling (next step)."
Source"Answer based on your knowledge.""Only cite official docs/runbooks in the workspace; if unsure, explicitly state 'Not found in documentation'."
A "Good Instruction" Template:
Plaintext


A "Good Instruction" Template:

Plaintext

You are a K8s Ops Assistant. Follow these rules:

1. Only handle Kubernetes cluster operations issues.

2. Fixed format: Conclusion → Command (commented) → Verification → Failure Handling.

3. Forbidden: Destructive commands like delete/uninstall.

4. If unsure, explain why and provide a diagnostic path.

5. Citations must include the document name or URL.

Avoid relying solely on instructions for security. True boundaries should be enforced by tool policies and sandbox constraints for deterministic protection (see Section 5.2: Tool Policy).

3.3.3a Instruction Examples by Complexity

Example 1: Simple — Personal Daily Assistant

JavaScript

{

 agents: {

   personal_assistant: {

     displayName: "OpenClaw-Personal",

     model: "gpt-5",

     tools: ["calendar_query", "reminder_set", "task_log"],

     instructions: `You are a personal schedule assistant.

1. Only handle schedule, task, and reminder requests.

2. Format: Confirmation → Result → Suggestions.

3. No access to private mail/finance data.

4. If unsure, do not hallucinate.`

   }

 }

}

Example 2: Medium — DevOps Team Assistant

JavaScript

{

 agents: {

   devops_assistant: {

     displayName: "OpenClaw-DevOps",

     tools: ["kubectl_get", "kubectl_logs", "healthcheck_run"],

     instructions: `You are a Team DevOps Assistant.

- Format: 1) Diagnosis, 2) Exec Steps (Commented Shell), 3) Verification, 4) Failure Handling.

- [Read] commands are unrestricted; [Write] commands (patch/upgrade) require a YAML diff and a "Press Ctrl+C to abort" prompt.

- Forbidden: Destructive delete commands.

- Citations must include timestamps/node names.`

   }

 }

}

Example 3: Complex — Multilingual Support Gateway

JavaScript


{

 agents: {

   support_gateway: {

     displayName: "Support Gateway",

     tools: ["language_detect", "intent_classify", "ticket_create", "agent_escalate"],

     instructions: `You are a Multilingual Support Gateway.

1. Detect language and respond in kind.

2. Classify intent (Consult/Account/Fault/Suggestion/Complaint).

3. Self-service first: use knowledge_search/faq_retrieve.

4. Escalate only if: user requests human, self-service fails 3x, or security issue.

5. For escalation: generate ticket_create and inform user of tracking ID.

- Keep technical terms (Pod, API) in English.

- Redact PII (emails/IDs) in logs.`

   }

 }

}

3.3.4 The Alternative: SOUL.md and Personalized Persona

While the previous sections treat instructions as "executable specs" for teams, an alternative exists for personal use: SOUL.md. This file defines the communication style and relationship mapping, making the agent feel more like a "colleague who knows you" rather than a "support bot following SOPs."

DimensionExecutable Spec RouteSOUL.md Personal Route
Use CaseTeam tools, ProductionPersonal assistant, Private
GoalStability, AuditabilityNatural, Personalized
ContentProhibitions, EscalationStyle, Background, Taboos

SOUL.md Structure:

  1. Who you are: Your profession and interests to give the agent context.
  2. Relationship: Are you partners or is it a formal assistant?
  3. Preferences: Concise vs. detailed? Direct vs. polite?
  4. Initiative: When is it okay to suggest vs. when should it just listen?
  5. "Onboarding" via Knowledge Base: A more advanced method is uploading your notes/workflows to the workspace. Let the agent "read" your history to update its memory.md. This is often more efficient than manual SOUL.md writing as the agent infers your habits.
[!NOTE] SOUL.md is great for personal exploration. For team/production environments, stick to checkable instruction specs to prevent "persona drift." You can layer them: use instructions for boundaries and SOUL.md for style.

3.3.5 Verification: Diagnotics and Log Workflows

To verify if instructions are working, use reproducible test cases:

  1. Input a fixed test set in WebChat covering boundaries and failure scenarios.
  2. Use status --deep and structured logs to check agent selection and tool policies.
  3. Bash

openclaw status --deep

openclaw logs --follow --json

If the model deviates, check if the issue is with routing bindings or tool policies before modifying the instruction itself. Version control your instructions alongside your configuration to maintain consistency across environments.


3.4 Pairing, Group Chat Strategies & Access Boundaries

Imagine if, without any security configuration, your agent is pulled into a large group of 500 people. If someone @mentions it with a sensitive question and it directly leaks internal company data, you have a serious problem.

The core objective is to put a "security lock" on the agent: intercepting strangers in private chats with pairing, enforcing @mention thresholds in group chats, and restricting which groups it can join.

3.4.1 Private Chat Strategy: Who Can Talk to the Agent

OpenClaw supports four private chat policies (dmPolicy). It is generally recommended to use pairing or allowlist.

  1. pairing (Recommended): Unknown users must first send any message to request pairing. The message will be intercepted, and a Pairing Code will be generated. An administrator must then approve it in the backend (openclaw pairing approve <CODE>) before the user can start a conversation.
  2. allowlist: Only IDs manually entered into the configuration list can interact with the bot.
  3. open: Anyone can start a conversation (Extremely dangerous; could lead to massive API bills).
  4. disabled: Completely disables private chat functionality; the agent will not respond to any DM messages. This is suitable for agents intended solely for group chat scenarios.

Configuration Example (e.g., Feishu/WhatsApp with Pairing + Allowlist):

JavaScript

{

 channels: {

   feishu: {

     dmPolicy: 'pairing',

     allowFrom: ['ou_123456789'],

   },

 },

}

Operational Example: Listing and Approving Pairing Requests

Bash

openclaw pairing list feishu

openclaw pairing approve feishu <CODE> --notify


3.4.2 Three Boundaries for Group Chat: Policy, Mention Gating & Capacity Convergence

Group chats carry high risks, specifically input noise and unauthorized side effects. It is strongly recommended to follow these three thresholds in your configuration:

  1. Group Policy (groupPolicy): Set to disabled unless necessary. If enabled, always pair it with an allowlist (allowlist / groupAllowFrom).
  2. Mention Gating (requireMention): This is the most important safety catch for group chats. It forces the agent to only process messages that explicitly @ the bot, ignoring all other idle chatter in the group.
  3. Capacity Convergence: Adopt more conservative tool policies by default in group chats to avoid high-risk write operations.
  4. Configuration Example (e.g., Feishu with Allowlist + Mention Gating):
  5. JavaScript

{

 channels: {

   feishu: {

     groupPolicy: 'allowlist',

     groupAllowFrom: ['group_123456'],

     groups: { '*': { requireMention: true } },

   },

 },

 messages: {

   groupChat: {

     mentionPatterns: ['@openclaw', '@AI_Assistant'],

   },

 },

}

3.4.3 Group Chat Acceptance Test: 5-Minute Configuration Check

It is recommended to perform three sets of test cases:

  1. No Mention, No Trigger: Send a normal message in the group; the system should not respond.
  2. Mention Trigger: Mentioning the bot should trigger a response.
  3. Allowlist Block: Group chats not on the allowlist should not trigger a response.
  4. Troubleshooting Key: Use channels status --probe to confirm channel connectivity first, then check logs to see if messages are identified as group chats and whether they hit the gating rules.
  5. Bash

openclaw channels status --probe

openclaw logs --follow --json


3.4.4 Multi-Group Context Isolation: Shunting by Scenario

Beyond security and access control, group chats offer an often-overlooked engineering value: Context Isolation.

When all topics happen within a single session, the memory file becomes increasingly long and cluttered. The agent must process significant irrelevant information, leading to a drop in response quality and speed. By creating multiple groups, you can naturally isolate contexts by scenario:

  • Each group has an independent session; memory only contains content relevant to that scenario.
  • Agent performance is more precise in different groups without interference from other topics.
  • Users can manage history more easily: find work content in the work group and writing materials in the writing group.

Typical Group Shunting Plan

GroupUse CaseContext Characteristics
Main Chat (DM)Daily use, personal memoryLong-term memory, personalized config
Work GroupSpecific work tasksOnly contains work-related context
Writing GroupContent creationOnly contains writing styles and templates
Test GroupTesting new features/configsCan be cleared at any time without affecting formal memory
Operation: Simply create a new group chat and add both yourself and the agent. Each group's sessionKey is naturally unique, and the context is isolated automatically.
[!TIP] Multi-group shunting is much more cost-effective than frequently using /new to start fresh sessions. While /new discards the current context, group isolation is persistent—each group accumulates its own memory without mutual interference.

Operation: Simply create a new group chat and add both yourself and the agent. Each group's sessionKey is naturally unique, and the context is isolated automatically.

[!TIP] Multi-group shunting is much more cost-effective than frequently using /new to start fresh sessions. While /new discards the current context, group isolation is persistent—each group accumulates its own memory without mutual interference.

3.5 Summary

The goal of Chapter 3 is to establish a "Local Minimum Closed-Loop Baseline": verifying that the main loop is functional, observable, and reproducible without introducing external channel variables. This provides a stable reference frame for subsequent configuration tuning and scaling.

3.5.1 Core Design Principles

After completing this chapter, you should have solidified the following key conclusions:

  • Baseline First: Run fixed test cases through the Dashboard/WebChat before expanding channels and capabilities to prevent overlapping variables from making issues hard to reproduce.
  • Instructions as Contracts: Write initial instructions as checkable clauses (goals/boundaries/output structures) and verify their effectiveness through log replays.
  • Converge Entry Points: Use pairing, group chat gating, and allowlists to keep the trigger surface within a controllable range; high-risk capabilities still require tool policies as a safety net.
  • Ordered Troubleshooting: Check health/status first, then channel and model availability, and finally review structured logs. Avoid changing prompts prematurely, which can mask root causes.

3.5.2 Security & Troubleshooting Self-Checklist

Before moving to the next chapter, please self-assess:

  • Can you run the minimal test cases from Section 3.1 and locate the corresponding requests and traces in logs --json?
  • Can you complete the full operational loop of approving/revoking new device access and explain its impact scope?
  • When a "No response/No trigger/Blocked" event occurs, can you use a chain of evidence to determine if it was caused by gating, pairing, routing, or tool policies?

3.5.3 What's Next

Chapter 4 dives into the configuration system and model integration: upgrading from "able to answer" to "controllable and replaceable," and establishing a baseline for verifiable model selection and failover.

Chapter 4: Configuration System & Model Integration Basics

This chapter addresses configuration and models from the perspective of "System Controllability": first, by understanding the structure and priority of configuration files; next, by completing the integration of model providers; and finally, by establishing basic strategies for model selection and failover. Through this chapter, you will master how to transform an OpenClaw system into a "predictable and tunable" foundation for agents. After reading, you should be able to independently answer three questions: Where is the current configuration taking effect? Why was the current model selected? How will the system degrade when a failure occurs?

Chapter Guide

This chapter includes the following sections:

  • 4.1 openclaw.json Structure & Configuration Priority: Understand the core structure and precedence of the openclaw.json file.
  • 4.2 Model Provider Integration & Authentication: Master the workflows for connecting model providers and their respective authentication methods.
  • 4.3 Model Selection & Default Strategies: Establish a four-dimensional decision framework for model selection based on quality, cost, latency, and reliability.
  • 4.4 Failover Basics: Fallback Chains & Recovery: Configure and verify basic failover paths to ensure system resilience.
  • 4.5 Summary: Key conclusions and self-assessment exercises.

Learning Objectives

Upon completing this chapter, you will be able to:

  1. Understand Configuration: Quickly identify configuration priorities and enforcement rules.
  2. Integrate Models: Successfully complete the integration and authentication of multiple model providers.
  3. Make Decisions: Make informed model selections based on quality, cost, latency, and reliability.
  4. Plan Fault Tolerance: Design basic failover chains to improve overall system stability.

4.1 openclaw.json Structure & Configuration Priority

This section systematically reviews the operation chain of the OpenClaw configuration system, including specific configuration sources, priority determination, override rules, and auditing mechanisms. The core objective is to ensure that any runtime behavior of the system can be traced back to a definitive input configuration and to provide specific methods for verifying the final effective parameters.

4.1.1 What is Configuration: Turning System Behavior into Traceable Input

In OpenClaw, configuration is not a "collection of parameters" but the input for system behavior. The significance of writing behavior into configuration is that when the same system runs on different machines, channels, or accounts, its behavior remains reproducible and explainable.

From the perspective of the system chain, configuration determines at least three types of outcomes:

  • Ingress Outcomes: Which messages can enter the system, and which are rejected or ignored.
  • Operation Outcomes: Which Agent/Node executes a task, which tools are used, and how output is re-injected into the session.
  • Governance Outcomes: Whether permission boundaries, auditing, and fault-handling strategies maintain a steady state during anomalies.
  • In practice, the most commonly used main configuration file is located at ~/.openclaw/openclaw.json. Beyond direct file editing, you can browse and modify the current runtime configuration by category in the Config view of the Dashboard:
Figure 4-1: Config Global Configuration View.

4.1.2 Scope Partitioning: What Gateway, Agent, Channel, and Tool Control

Many instances of "configuration not taking effect" are not priority issues, but rather issues of writing to the wrong scope. The safest approach is to split configuration into four layers of responsibility before discussing overrides:

  • Gateway Level: Control plane behaviors such as connection, authentication, pairing, routing, and state governance.
  • Agent Level: Default workspaces, task strategies, and capability composition (how models, tools, and memory collaborate).
  • Channel Level: Channel credentials and access boundaries, such as Telegram tokens or WhatsApp allowed sources.
  • Tool Level: Tool enablement, least privilege, timeouts, and receipt standards (how tool output enters the context).
  • A practical rule of thumb:
  • Ingress-related "Who can trigger" belongs to the Channel or Gateway level.
  • "How to perform tasks and what capabilities to use" belongs to the Agent and Tool level.
  • "What to do upon failure and how to audit" usually spans both Gateway and Tool layers but with different touchpoints.

4.1.3 Priority Model: How Defaults, Files, Env Injection, and Runtime Overrides Stack

You can understand priority using a stable model without relying on implementation details: the closer to runtime, the higher the priority. Common sources include:

  1. Default Values: Built-in to the program to ensure the system can at least start (e.g., Gateway default port 18789).
  2. File Configuration: JSON5 content from ~/.openclaw/openclaw.json.
  3. Environment Injection: Placeholders written as ${VAR_NAME} in configuration string fields will read the real value from environment variables at runtime (typically used for sensitive info like keys and tokens). Only uppercase names [A-Z_][A-Z0-9_]* are replaced; missing variables will cause an error at load time. Use $${VAR} to escape as a literal. .env files in the working directory or ~/.openclaw/ are automatically loaded.
  4. Runtime Overrides: Temporary overrides via command-line arguments at startup (e.g., --port 19091), used for one-off experiments and quick rollbacks.
  5. The diagram below expresses a relationship of "stacking rather than replacing":

The same field may appear in multiple places; the final value depends on the override source, not "which one is written later in the file." Therefore, when governing configurations, avoid redundant definitions of the same semantic fields to prevent leftover overrides when copying across environments.


4.1.4 Evidence of Effect: Proving the "Final Effective Value"

Mapping "configuration taking effect" to an evidence chain creates a stable troubleshooting method. It is recommended to collect four types of evidence in order:

  • Path Evidence: Confirm the output Loading config from ~/.openclaw/openclaw.json.
  • Content Evidence: A redacted snapshot of key fields. Use openclaw status --deep to verify if key configurations (e.g., default agents, model selection, channel/tool policies) were loaded.
  • Runtime Evidence: Logs must explain "why a rejection occurred." For example: [Channel/WhatsApp] Message rejected: sender +19999999999 not in allowFrom list.
  • Behavioral Evidence: Use a reproducible experiment to verify if a field is effective.
  • Reproducible experiments should ideally choose "binary outcomes." For example:
  • Channel Allowlist: Send a message from an unauthorized number (expected: rejected) and an authorized number (expected: received).
  • Tool Policy: Trigger a tool asserted as disabled (expected: logs print Tool 'execute_shell' denied by policy: minimum clearance level required).
  • To make the evidence chain more deterministic, fix your acceptance commands to a minimal set: first doctor to confirm config readability, then status --deep to confirm loading, and finally logs --follow --json to replay a specific chain.

4.2 Model Provider Integration & Authentication

This section focuses on the models.providers configuration, specifically exploring naming conventions and association mechanisms between providers and models. We will cover best practices for safely referencing API keys using ${VAR_NAME} placeholders to avoid plain-text storage and the use of multiple keys with keyId to support smooth rotation and canary releases. Finally, we provide a standardized set of acceptance commands to ensure that integrated models are not only highly available but also flexibly replaceable.

4.2.1 Conceptual Split: Hierarchy of Provider and Model

In OpenClaw, provider configurations are centralized in models.providers, which defines "how to connect and what credentials to use." Note:

  • Built-in Providers (OpenAI, Anthropic, Moonshot, etc.): Keys are typically written by the openclaw onboard wizard to ~/.openclaw/agents/<agentId>/agent/auth-profiles.json and do not require manual configuration in openclaw.json.
  • Custom/Self-hosted Providers (e.g., OpenRouter or local proxies): These must be declared in models.providers with connection info and a baseUrl.
  • The actual call target is determined by a model identifier in the format provider/model, such as openai/gpt-5.2 or anthropic/claude-sonnet-4-6.
  • The association follows a single rule: the provider prefix in the model identifier must match a provider name defined in models.providers (for custom providers) or exist in the authentication profiles (for built-in providers).
  • Minimal Example (Integration and Selection):
  • JavaScript

{

 models: {

   providers: {

     openai: { apiKey: "${OPENAI_API_KEY}" },

     anthropic: { apiKey: "${ANTHROPIC_API_KEY}" },

   },

 },


 agents: {

   defaults: {

     model: {

       primary: "openai/gpt-5.2",

     },

   },

 },

}

Common Naming Patterns (Examples for formatting reference):

  • openai/gpt-5.2, openai/gpt-5-mini
  • anthropic/claude-sonnet-4-6
  • moonshot/kimi-k2.5
  • minimax/MiniMax-M2.5
  • zai/glm-5

4.2.1a Quick Integration Reference for Major Providers

Anthropic (Claude)

  • Env Var: ANTHROPIC_API_KEY
  • Identifiers: anthropic/claude-sonnet-4-6, anthropic/claude-opus-4-6

Setup: export ANTHROPIC_API_KEY="sk-ant-<YOUR_API_KEY>..."

OpenAI

  • Env Var: OPENAI_API_KEY
  • Identifiers: openai/gpt-5.2, openai/gpt-5-mini

Setup: export OPENAI_API_KEY="sk-proj-<YOUR_API_KEY>.."

Self-hosted Ollama

  • Base URL: http://localhost:11434/v1
  • Adapter: openai-completions
  • Identifiers: ollama/llama2, ollama/mistral

OpenRouter

  • Base URL: https://openrouter.ai/api/v1
  • Adapter: openai-completions
  • Env Var: OPENROUTER_API_KEY

4.2.2 Key Injection: ${} Interpolation and SecretRef Pipelines

Configuration supports writing ${VAR_NAME} in string fields or using a SecretRef object. We recommend placing keys in environment variables or a secret management system so that configuration files remain auditable and reproducible without leaking credentials to disk.

  1. Basic Form: ${} Environment Interpolation
  2. OpenClaw reads variables with the following priority: Process Env > Local .env > ~/.openclaw/.env. Note: .env files will not override existing process variables.
  3. Advanced Form: SecretRef for Unified Credential Pipelines
  4. For production-grade deployments, SecretRef allows for secure loading from various sources, facilitating dependency auditing via the openclaw secrets system.
  5. SecretRef supports three source types:
  • "env": Read from environment variables (most common).
  • "file": Read content from a file path.
  • "exec": Execute a command and capture its standard output.
  • JavaScript

{

 models: {

   providers: {

     openai: {

       apiKey: { source: "env", id: "OPENAI_API_KEY" },

     },

     anthropic: {

       apiKey: { source: "file", id: "/run/secrets/anthropic_key" },

     },

     custom: {

       apiKey: { source: "exec", id: "vault kv get -field=key secret/custom" },

     },

   },

 },

}

[!WARNING] Disk Leakage Risk: While standard interpolation is common, using the SecretRef object strictly isolates the data pipeline. This prevents keys from being accidentally de-referenced and written back to disk in plain text if the system or an AI agent rewrites the configuration file.

4.2.3 Multiple Keys and keyId: Structure for Rotation

A single provider can host multiple keys, with keyId selecting the default. We recommend adding a new key and verifying it with small-scale traffic before switching the keyId and revoking the old key.

Rotation Tip: Do not simply replace the value of an environment variable during a failure window, as this destroys the evidence trail. Instead, add a new key, switch the ID, observe, and then decommission.


4.2.4 Acceptance: Observable Verification with models status

After configuration, use this minimal command set for verification:

Bash

openclaw doctor

openclaw models status --check

openclaw status --deep

  • doctor fails: Fix config readability and runtime dependencies first.
  • models status --check fails: Check if env vars exist or if ${} placeholders are spelled correctly.
  • Provider/Model missing in status --deep: Review the hierarchy (ensure models.providers and agents.defaults.model are correctly placed).

4.2.5 Rate Limits and Quota Management

Upstream providers implement rate limits. OpenClaw handles these to prevent service interruptions.

Handling Upstream 429 Responses

When a 429 (Too Many Requests) is received, OpenClaw employs exponential backoff and a cooldown window:

  1. First 429: Immediate cooldown; first retry after 1s.
  2. Repeated 429: Wait time doubles (1s → 2s → 4s → 8s).
  3. Threshold Reached: After maximum retries or a 60s cooldown cap, the request fails and triggers the fallback chain (see Section 4.4).

Configuring Local Rate Limit Policies

You can proactively control request frequency in models.providers to prevent hitting upstream limits:

JavaScript

{

 models: {

   providers: {

     openai: {

       rateLimit: {

         requestsPerMinute: 60,

         tokensPerMinute: 100000,

       },

     },

   },

 },

}

Multi-Provider Fallback to Evade Single-Point Throttling

The most robust approach is configuring multiple providers. If one provider is frequently throttled, the system automatically switches to a secondary provider to ensure continuity.

💡 Real-World Note: The "Invisible" Env Var A common trap: exporting ANTHROPIC_API_KEY in .bashrc, but OpenClaw fails to read it when running as a systemd service. Systemd does not load shell profiles. Solution: explicitly define it in the systemd unit file using EnvironmentFile= or use openclaw secrets configure.

4.3 Model Selection & Default Strategies

This section clarifies the implementation of model selection through agents.defaults.model: how to set the default primary model, how individual agents can override these defaults, and which capability constraints to prioritize for tool calls and long-context scenarios. Finally, it provides a verification method based on models status and regression testing to move model selection from "intuition" to "comparability."

4.3.1 Decision Dimensions: Quality, Cost, Latency, and Reliability

Model selection should not be based solely on performance. For an operational system, at least four dimensions must be considered simultaneously:

  • Quality: Task completion, factuality, and tool-call alignment.
  • Cost: Cost per request and the cost of failed attempts.
  • Latency: Average and tail (P99) latency.
  • Reliability: Failure rates, rate-limiting frequency, and jitter recovery time.
  • Comparison of Selection Strategies for Typical Business Scenarios:
Business ScenarioQuality RequirementLatency ToleranceCost SensitivityRecommended Strategy
Customer SupportMedium (Standard QA)Low (User waiting)High (Per-message billing)Small model primary (e.g., gpt-5-mini), large model fallback.
Data AnalysisHigh (Complex logic)Medium (Can wait 30s)MediumLarge model primary (e.g., gpt-5.2), focus on context window.
Scheduled InspectionHigh (Structured output)High (Offline/Async)LowLarge model primary; retry upon failure rather than downgrade.

Engineering-wise, the most stable approach is to fix a default primary model and use fallback chains for safety, rather than frequently switching models manually.

Model Selection Decision Tree

The following diagram illustrates the complete decision-making process for model selection and fallback configuration based on task complexity:

Steps to use this decision tree:

  1. Assess Task Complexity: Determine if your scenario involves simple QA, medium reasoning, or complex coding/logic.
  2. Select Primary Model: Follow the corresponding branch to find the recommended model.
  3. Configure Fallback Chain: Assign at least one downgrade model to the primary (see Section 4.4).
  4. Example Configuration (Medium Reasoning Scenario):
  5. JavaScript

{

 "agents": {

   "defaults": {

     "model": {

       "primary": "openai/gpt-5.2",

       "fallbacks": [

         "anthropic/claude-sonnet-4-6"

       ]

     }

   }

 }

}

4.3.2 Default Primary Model: agents.defaults.model.primary

The default primary model should be defined in agents.defaults.model.primary. Treat it as a "system baseline" rather than a casual toggle. Fix the default value first and use fallback chains to handle edge cases, ensuring the evidence chain remains intact during troubleshooting.


4.3.3 Overriding for Individual Agents: agents.list

When different agents handle distinct tasks, you can override model selection within agents.list. This allows the model to evolve alongside tool policies and workspace isolation.

JavaScript

{

 "agents": {

   "list": [

     {

       "id": "assistant",

       "model": { "primary": "openai/gpt-5.2" }

     },

     {

       "id": "fast",

       "model": { "primary": "openai/gpt-5-mini" }

     }

   ]

 }

}

4.3.4 Coupling with Context and Tools

When selecting models, double-check these three constraints:

  • Context Window: Can it handle the volume of the target task's context?
  • Tool-Call Capability: Can it reliably follow tool signatures and output contracts?
  • Output Format: Can it consistently produce structured results for auditing and replay?
  • If your system relies heavily on tool receipts and structured re-injection, the model’s "format compliance rate" is often more critical than its general QA performance.

4.3.5 Verification: Probes and Regression Cases

First, confirm model availability, then run a minimal regression test.

Bash

openclaw models status --check

When changing the primary model or adjusting the fallback chain, use a fixed set of regression cases covering:

  • Factual cases
  • Tool-based cases
  • Multi-turn context cases
  • Record the regression results alongside cost and latency data to ensure that "seemingly smarter" models don't introduce instability.

4.4 Failover Basics: Fallback Chains & Recovery Strategies

This section introduces the configuration methods for fallback chains, their trigger timing, and their linkage with retry mechanisms. The core of the configuration lies in using agents.defaults.model.primary and agents.defaults.model.fallbacks to set the primary model and a prioritized list of fallback targets. Additionally, this section provides a verification scheme based on "fault injection and observation" to ensure continuous availability and the actual effectiveness of the fallback mechanism.

4.4.1 Fault Classification: Mapping Errors to Actions

The key to fallback is not "switching upon any failure," but "applying different actions to different failures." It is recommended to classify errors into three categories based on "operability":

  • Configuration/Authentication (Fail Fast): e.g., 401/403 errors, missing keys, or incorrect field hierarchy. These should fail fast and provide actionable guidance; do not blindly retry within the same chain to avoid amplifying the failure window.
  • Transient Faults (Bounded Retry): e.g., brief timeouts, jitter, or occasional 5xx errors. These can be retried, but a budget (number of attempts and total duration) must be set.
  • Persistent Unavailability (Trigger Fallback): e.g., continuous rate limiting (429), prolonged provider downtime, or persistent network failures. The system should switch to fallback targets as quickly as possible to maintain continuity.
  • Mixing these three types of errors leads to two negative effects: problems that should fail fast are bogged down by meaningless retries, and problems that require fallback are not diverted in time.

4.4.2 Fallback Configuration: agents.defaults.model.fallbacks

The minimal viable syntax consists of a primary model plus a sequential fallback list. The system will attempt alternative models in order when the primary model fails.

JavaScript

{

 agents: {

   defaults: {

     model: {

       primary: "openai/gpt-5.2",

       fallbacks: [

         // First fallback: Smaller model from the same provider (common for handling concurrent rate limits)"openai/gpt-5-mini",

         // Second fallback: Cross-provider model (common for handling upstream or persistent network failures)"anthropic/claude-sonnet-4-6",

       ],

     },

   },

 },

}

It is recommended to prioritize based on "continuity first, but explainable": the earlier a fallback target appears, the more stable and available it should be, with acceptable fluctuations in cost and quality.


4.4.3 Linkage with Retries: Preventing Window Amplification

Fallback chains must be designed alongside retries; otherwise, "retry deadlock" or "silent switching" may occur.

  • Bounded Retry: Limit the maximum number of attempts and total duration to prevent amplifying queues and costs during an unavailability window.
  • Fallback Priority: Place more stable backup models at the front; keep at least one same-provider and one cross-provider fallback for redundancy.
  • Observable Recovery: Switch back only after the primary link has recovered to avoid unpredictable output quality and costs caused by repeated jitter.

4.4.4 Auth-profile Level Cooldown: Built-in Rotation for Same-Provider Accounts

Beyond cross-model fallbacks chains, OpenClaw maintains an auth-profile rotation mechanism within the same provider. When a key or account triggers a failure, the system automatically switches to the next available credential for that provider instead of immediately jumping across models.

Cooldown Gradients:

Failure CountCooldown Duration
11 minute
25 minutes
325 minutes
4+1 hour (Cap)

Billing-related blocks (e.g., 402 payment failure) have an independent gradient: starting from 5 hours, doubling each time up to a 24-hour cap; the timer resets automatically after 24 error-free hours.

Cooldown states are persisted in the usageStats field of ~/.openclaw/agents/<agentId>/agent/auth-profiles.json and remain effective after a restart. For more on auth-level reliability, see Chapter 11.


4.4.5 Verification: Fault Injection and Reconciliation

Fallback is not complete just because it is written in the config; you must verify that it actually triggers and can be reconciled.

  1. Baseline: models status --check passes, indicating that authentication and network for both primary and backup providers are functional.
  2. Observation: Follow structured logs to ensure you can identify fallback events (commonly named model_fallback).
  3. Injection: Artificially create a condition where "primary link fails, backup remains available" (e.g., temporarily revoke the primary key or trigger primary model rate limits).
  4. Reconciliation: Confirm that fallback events appear in the logs and that the targets hit match the fallbacks sequence.
  5. Operational Example (Observation Window):
  6. Bash

openclaw models status --check

openclaw logs --follow --json


4.5 Summary

Chapter 4 upgrades "model accessibility" to "model controllability." The core objective is not simply switching to more powerful models, but transforming configuration, authentication, selection, and failover into explainable system capabilities.

4.5.1 Key Conclusions

  • Configuration Dictates Behavior: Distinguish between scopes first, then address priorities and evidence of effect.
  • Provider Integration Must Be Replaceable: Key injection, environment isolation, and rotation are the baseline requirements.
  • Engineering-Driven Model Selection: Balance the four dimensions of Quality, Cost, Latency, and Reliability, relying on a fixed regression test suite.
  • Verifiable Failover: Retries, rotations, fallbacks, and cooldowns must be triggerable and explainable during drills.

4.5.2 Minimum Closed-Loop (Reproducible)

Below is a "essential-only" minimal configuration: integrate a provider, set a default primary model, configure a fallback chain, and verify with commands.

  1. Environment Variables (Example):

Bash

export OPENAI_API_KEY="..."export ANTHROPIC_API_KEY="..."

2. Configuration Snippet (Merge this into your ~/.openclaw/openclaw.json):

JavaScript

{

 models: {

   providers: {

     openai: { apiKey: "${OPENAI_API_KEY}" },

     anthropic: { apiKey: "${ANTHROPIC_API_KEY}" },

   },

 },


 agents: {

   defaults: {

     model: {

       primary: "openai/gpt-5.2",

       fallbacks: ["openai/gpt-5-mini", "anthropic/claude-sonnet-4-6"],

     },

   },

 },

}

  1. Acceptance Commands (Focus on results, not intuition):
  2. Bash

openclaw doctor

openclaw models status --check

openclaw status --deep

Achieved Objectives: Providers are available, the default model is explainable, and a fallback chain exists and is drill-ready.

4.5.3 Reader Self-Check

  • Can you describe the evidence chain (config path, health check, logs) for the final effective value of a specific configuration field?
  • Do you have at least two model paths (one primary, one backup) set up and verified?
  • When a 401/429/Timeout/5xx error occurs, can you explain exactly what action the system should take for each?

4.5.4 What's Next

Chapter 5 moves into the tool system, skills, and plugins: upgrading from "being able to answer" to "being able to act," while confining those actions within least-privilege and auditable boundaries.


Chapter 5: Tool Systems, Skills, and Plugins

This chapter discusses the action layer of AI Agent systems: while the model is responsible for proposing intent, it is the tools and extended capabilities that generate actual external impact. The core theme is upgrading "the ability to call tools" into "the ability to execute stably within defined boundaries, with auditability, troubleshooting, and rollback capabilities."

Learning Objectives:

  • Understand key interception points in the tool-calling chain and identify which boundaries should be secured by runtime policies.
  • Master tool policy methods including allowlists, denylists, and layered strategies to establish a "default-deny" minimum privilege model.
  • Learn about the activation/deactivation and allowlist governance of the plugin system, and master the collaborative positioning and governance boundaries of skills versus plugins.
  • Master common commands and troubleshooting closed-loops for browser tools.

Chapter Guide

This chapter consists of the following sections:

  • 5.1 Tool Inventory and Invocation Paradigms: A review of tool classifications from an engineering perspective, establishing a systematic understanding of tool contracts, failure semantics, and invocation paradigms.
  • 5.2 Tool Policy: Allow, Deny, and Layered Strategies: An explanation of the matching semantics for tools.allow and tools.deny, and layered governance based on channels or groups.
  • 5.3 Skill Mechanisms: Solidifying Instructions via Built-in Libraries: Distinguishing the positioning of plugins (extended capabilities) versus skills (solidified methods), with warnings regarding ClawHub supply chain risks.
  • 5.4 Browser Tools and Web Automation: An introduction to the four progressive levels of browser capabilities, along with common commands and troubleshooting cycles.
  • 5.5 Chapter Summary: Key conclusions and reader self-assessment.
Note: Command examples in this book may occasionally omit the "main command prefix" (e.g., certain deployments require a unified CLI prefix before subcommands). If you encounter a "command not found" error during operation, please prioritize the --help output of your local CLI and the conventions used in other chapters of this book.

5.1 Built-in Tool Landscape and Invocation Paradigms

This section explores how to construct and manage a tool inventory from an engineering perspective, covering core concepts such as tool contracts, failure semantics, and read/write boundaries. It also introduces how to ensure the reproducibility and replayability of tool calls (supporting hands-on verification in local instances).

5.1.1 Tool Inventory: Sources, Grouping, and Manifests

Building a tool inventory is not about memorizing a static list (as available tools change with versions, configuration templates, plugins, and deployment forms); rather, it is to enable you to answer three things on your own instance:

  1. Where the candidate toolset comes from: The base set from tools.profile + new tools brought by enabled plugins.
  2. Which tools are pruned by policy: Layered restrictions from tools.allow/tools.deny and channels.*.groups.*.tools converge the candidate set into an executable set.
  3. How an invocation is audited: When a tool is allowed, denied, or fails, you can review "which rule was hit and where it failed" via status --deep and structured logs.
  4. It is recommended to maintain your "local tool inventory" as a reusable table (useful for writing tool policies, acceptance testing, and troubleshooting):
Tool CategoryTypical Tool Patterns (Examples)Risk LevelDefault Policy SuggestionAcceptance Focus
Read-only Querygroup:web, read, memory_searchLowAllow by default (rate-limit as needed)Accuracy, latency, availability
Side-effect Writewrite, edit, group:messagingMediumDeny by default (open by entry/role)Idempotency, rollback, permission boundaries
Exec/Commandgroup:runtime (exec, bash, process)HighDeny by default (open for min. scope)Whitelists, audit, blast radius
Interaction Autogroup:ui (browser, canvas)HighDeny by default (open if necessary)Step verifiability, failure localization
Extended Toolsplugins.* (provided by plugins)VariesPlugin whitelist first, then tool policyStart/stop, canary, replayable evidence
Note: The "Tool Patterns" in the table above represent governance methods and risk layering; specific tool IDs and available commands should be based on the actual output of status --deep, structured logs, and subcommand --help on your local instance.

A minimal, reproducible "Tool Inventory Generation/Verification" workflow (rely on evidence, not memory):

  1. Static Evidence: Check if the convergence intent in tools.profile, tools.allow, tools.deny, and channels.*.groups.*.tools is clear.
  2. Runtime Evidence: Use status --deep to confirm configurations are loaded; use structured logs to replay a tool allow/deny event to ensure hit rules are traceable.
  3. Extended Evidence: If plugins are introduced, first use plugins list/plugins doctor to confirm "is the plugin loaded, enabled, and healthy" before discussing whether tool policies permit them.
  4. The goal of this section is to transform "tools as black-box capabilities" into "tools as inventory-able, layered, and verifiable engineering objects." The next four subsections address: tool contracts, invocation paradigms, back-injection, and observability.

5.1.2 What is a Tool: Contracts, Boundaries, and Failure Semantics

In OpenClaw, a "tool" should be viewed as a controlled operation: it has explicit inputs, explicit outputs, explicit failure semantics, and its side effects must be auditable.

The core significance of treating tools as first-class objects is pushing the system from "acting on feeling" to "acting by contract." A contract answers at least four questions:

  • What is the input: Parameter structure, required fields, and valid value ranges.
  • What is the output: Structured fields, key conclusions, and evidence sources.
  • What is a failure: Which errors are retryable and which must fail immediately and be reported.
  • What is the side effect: Whether it writes to external systems, whether it is reversible, and what the rollback path is.
  • The following is a conceptual tool contract example (illustrating design principles; field names and available items depend on your local instance):
  • JSON

{

 "name": "create_ticket",

 "description": "Create a support ticket",

 "parameters": {

   "type": "object",

   "properties": {

     "title": { "type": "string" },

     "priority": { "enum": ["P1", "P2", "P3"] }

   },

   "required": ["title"]

 },

 "failure_semantics": {

   "retryable_errors": ["NetworkTimeout", "RateLimitExceeded"],

   "fatal_errors": ["Unauthorized", "InvalidFormat"]

 }

}

As long as side effects exist, "failure semantics" must be designed on the tool side rather than relying on prompt-based remedies after the fact.


5.1.3 Invocation Paradigms: Read-only, Side-effect Writes, and Chained Orchestration

Based on permission boundaries and acceptance flows, tool calls can be categorized into three paradigms:

  • Read-only Query: Retrieval, reading, statistics. Acceptance focuses on "accuracy, latency, and stability."
  • Side-effect Write: Sending messages, creating tickets, changing configs, deleting objects. Acceptance focuses on "permissions, idempotency, and rollback."
  • Chained Orchestration: Multiple tools connected to complete a task. Acceptance focuses on "interpretable intermediate states, failure localization, and replayable reproducibility."
  • Practical Supplement: Selecting Web Scraping Tools
  • In the read-only query paradigm, web content scraping is one of the most common scenarios. OpenClaw’s built-in Readability tool is suitable for most cases but has known limitations: it does not load JavaScript (some SPA pages scrape as blank), multi-page scraping is incomplete, and it lacks structured metadata.
  • jina.ai Reader is a noteworthy third-party alternative. Its usage is extremely simple: just prefix any URL with https://r.jina.ai/ to get the Markdown content of that page.
DimensionBuilt-in Readabilityjina.ai Reader
JavaScript RenderingNot SupportedSupported
Paywalled ContentRestrictedPartially Bypassable
Social Media (e.g., X/Twitter)Not SupportedSupported
Output FormatHTML SnippetsClean Markdown
Deployment DependencyNone (Built-in)None (Free API, no key required)
[!TIP] You can configure jina.ai Reader as a custom Skill, allowing the Agent to automatically degrade to it if Readability fails. See 5.3 Skills and Plugins for specific skill configuration methods.

The flowchart below illustrates the full lifecycle of a tool call, from model proposal to result back-injection:

flowchart TD

M["Model Reasoning"] -->|"Output tool call intent"| P["Proposal: tool_name + params"]

P --> C{"Policy Validation"}

C -->|"allow hit"| E["Execute Tool"]

C -->|"deny hit"| D["Deny + Reason Injection"]

E --> R{"Operation Result"}

R -->|"Success"| S["Structured Injection"]

R -->|"Retryable Error"| RT["Bounded Retry"]

R -->|"Fatal Error"| F["Failure Injection + Alert"]

RT --> E

S --> M2["Model Continues Reasoning"]

D --> M2

F --> M2

From an engineering standpoint, it is recommended to break chained orchestration into checkable stages: every step should produce structured intermediate results written to the session or log. This way, when a failure occurs, you can locate the specific step rather than just seeing "task failed."


5.1.4 Back-Injection Principles: Turning Results into Reusable Evidence

If tool outputs are back-injected as-is, the common consequences are context explosion and loss of evidence. A more robust back-injection method is the "three-part" structure:

  1. Conclusion Summary: Key conclusions obtained from this tool operation (directly usable for replies).
  2. Evidence Reference: Key fields, timestamps, and source identifiers (for traceability).
  3. Raw Output: Retained only when necessary to avoid massive useless text entering the context.
  4. The following is a standard back-injection log example:
  5. JSON

{

 "summary": "User account is active but password expired.",

 "evidence": {

   "account_id": "u_12345",

   "status": "active",

   "last_login": "2026-03-18T00:18:00Z"

 },

 "raw_output": "{\"db_record\": {...}}"

}

The goal of this structure is to allow subsequent reasoning to reference "evidence" rather than a large block of noise.


5.1.5 Observability: Making a Tool Call Reproducible and Replayable

The reliability of a tool system comes from observability. It is recommended to solidify the minimum reproducible information of a tool call as a record:

  • Input Summary: Key parameter fields (de-identified).
  • Permission Decision: Why it was allowed or denied (which policy was hit).
  • operation Result: Success/failure, duration, error category.
  • Back-injected Content: Summary and evidence tags written to the context.
  • When a "sporadic failure in the same task" occurs, the first reaction should not be to modify the prompt, but to replay the failure case against the same tool call record to locate the root cause after reproduction.

5.2 Tool Policy: Allow, Deny, and Layered Strategies

Based on official tool governance, this section translates the question of "whether a tool can be invoked" into a configurable and auditable policy plane. Key topics include the matching semantics of tools.allow and tools.deny, default policy selection via tools.profile, and layered governance by channel/group using channels.*.groups.*.tools. The goal is to ensure the system is "secure by default, open by necessity," and capable of answering "why this was allowed but that was denied" during incident reviews.

5.2.1 Official Tool Policy Structure: Four Core Blocks

The official configuration can be broken down into four key blocks:

  • tools.allow: A global allowlist supporting wildcards.
  • tools.deny: A global denylist supporting wildcards, which takes precedence over allow.
  • tools.profile: Selects a default tool configuration template (minimal, coding, messaging, or full).
  • channels.*.groups.*.tools: Layered tool restrictions by channel and group (includes allow, deny, and toolsBySender).
  • One critical default behavior: if only deny is configured, all other tools remain available by default. Production environments should explicitly converge high-risk tools.
Note: The tool IDs and wildcard patterns used here explain governance methods and layering logic; specific available tools and precise naming depend on your version, enabled plugins, and the actual output of status --deep.

5.2.2 Tool and Policy Concept Quick Reference

Before diving into configuration, clarify the mapping between system concepts and actual fields:


ConceptConfig FieldDefault ValueAliases or Supplementary Notes
Default Scenario Templatetools.profileminimalOptions: minimal, coding, messaging, full, etc.
Tool Groupinggroup:* prefix(Version-specific)Used to batch-control similar tools (e.g., group:runtime for command operation).
Global Allowlisttools.allow[]Must be explicitly declared under strict configurations.
Global Denylisttools.deny[]Highest priority: blocks even if allow permits the tool.

The tools/groups included in each profile are as follows:

ProfileIncluded Tools/Groups
minimalOnly session_status
codinggroup:fs, group:runtime, group:sessions, memory_search, memory_get, image
messaginggroup:messaging, sessions_list, sessions_history, sessions_send, session_status
fullNo restrictions (equivalent to no profile set)

The system also supports overriding global profiles per agent via agents.list[].tools.profile. Refer to the Tools Documentation.


5.2.3 Allow/Deny Semantics: Wildcards and Precedence

Key semantics provided by official documentation:

  • Wildcards: Supports * (case-insensitive).
  • Precedence: deny overrides allow.
  • Scope: Global rules are applied first, followed by specific channel/group-level restrictions.
[!WARNING] Wildcard "Foot-guns" and elevated Privileges: Overusing * in the allow list (especially alongside tools.elevated to bypass sandbox restrictions) means granting unconditional authorization to all unknown plugins, which can easily lead to silent privilege escalation. It is recommended to strictly control allowFrom, avoid *, and always include verification commands (like security audit) in pre-deployment checks.Note: tools.elevated is a global, sender-based configuration and cannot be set per-agent in agents.list[].tools. To restrict elevation for a specific agent, you should disable exec in that agent's tools.deny.

Configuration Example (converging from "default open" to "deny shell write operations"):

JavaScript

{

 tools: {

   allow: ['*'],

   deny: ['group:runtime', 'write', 'edit', 'apply_patch'],

 },

}

Specific Case: A Complete Interception Chain for Unauthorized Operation

Scenario: A DevOps assistant deployed in a Telegram group. A user says: "Help me delete that failing Pod in the staging environment."

  1. Model Decision: After reasoning, the model outputs a tool call intent exec with the parameter kubectl delete pod error-pod-<YOUR_API_KEY> -n staging.
  2. Policy Validation: The runtime checks the global tools.deny list and finds that exec (or group:runtime) hits a deny rule.
  3. Interception & Injection: The system does not execute the command; instead, it injects the rejection reason back into the conversation.
  4. Model Response: "I'm sorry, I don't have permission to perform deletion operations. Please contact a colleague with cluster management privileges or execute it manually in the O&M terminal."
  5. In the logs, this interception leaves a clear audit trail:
  6. JSON

{

 "ts": "2026-02-20T10:30:15Z",

 "trace_id": "t-20260220-042",

 "event": "tool_denied",

 "tool": "exec",

 "rule": "tools.deny: group:runtime",

 "agent": "dev_assistant",

 "channel": "telegram",

 "sender": "user_987654"

}

This is why security boundaries must be secured by tool policies rather than just instructions in the prompt—even if the model "wants" to execute, the policy intercepts it deterministically.


5.2.4 Layered Governance by Channel and Group

As systems scale to multiple channels, groups, and entry points, the platform provides the ability to restrict tools at the channel/group level via channels.*.groups.*.tools, with further overrides available via toolsBySender.

Specific Case: Internal Ops Group vs. External Support Group

Assume one OpenClaw instance serves both an external WhatsApp group and an internal R&D Telegram group:

  • External Support Group: Restricted as a baseline; only web_search or specific knowledge base tools are allowed.
  • Internal R&D Group: For specific admin IDs (e.g., 123456789), command operation tools like group:runtime are enabled via toolsBySender to run restricted scripts; other members remain under the baseline policy.
  • Example Configuration:
  • JavaScript

{

 tools: {

   profile: 'coding',

   deny: ['group:runtime'],

 },

 channels: {

   whatsapp: {

     groups: {

       '*': {

         tools: { deny: ['group:runtime'] },

       },

     },

   },

   telegram: {

     groups: {

       '*': {

         tools: { deny: ['group:runtime', 'write', 'edit'] },

         toolsBySender: {

           '123456789': { alsoAllow: ['group:runtime'] },

         },

       },

     },

   },

 },

}

It is recommended to use "more conservative group chat policies" as an operational baseline.


5.2.5 Customizing Tool Policies by Model Provider

Official support also exists for tailoring tool policies based on the model provider or specific model via tools.byProvider. For example, models with weaker tool-calling capabilities can be limited to a minimal toolset:

JavaScript

{

 tools: {

   profile: 'coding',

   byProvider: {

     'google-antigravity': { profile: 'minimal' },

     'openai/gpt-5.2': { allow: ['group:fs', 'sessions_list'] },

   },

 },

}


5.2.6 Sub-agent Tool Restrictions: Depth-Aware Layered Disabling

When a primary agent spawns a sub-agent via sessions_spawn, the sub-agent’s tool availability is automatically narrowed. This is a system-level, hard-coded security boundary—even if not explicitly declared in the config, the following tools are disabled.

Tools Always Disabled for All Sub-agents (SUBAGENT_TOOL_DENY_ALWAYS):

ToolReason for Disabling
gatewaySystem management tool; sub-agents should not control the gateway.
agents_listAgent listings belong to the management plane.
whatsapp_loginInteractive setup process; unsuitable for automated sub-tasks.
session_statusStatus queries should be managed by the parent agent.
cronScheduling authority should be converged at the top level.
memory_search / memory_getSub-agents should receive info via spawn prompts, not global retrieval.
sessions_sendSub-agents should return results via the announce protocol, not direct messaging.

Additional Tools Disabled for Leaf Nodes (SUBAGENT_TOOL_DENY_LEAF):

When a sub-agent reaches the maxSpawnDepth (meaning it cannot spawn further levels), it additionally loses:

ToolReason for Disabling
sessions_spawnLeaf nodes cannot spawn further sub-agents.
sessions_list / sessions_historySession management is reserved for the orchestrator.

Decision formula: isLeaf = depth >= max(1, floor(maxSpawnDepth)).

Configuration Override: System-level disabling can be explicitly bypassed using tools.subagents.tools.alsoAllow. For instance, if a sub-agent truly needs memory access:

JavaScript

{

 tools: {

   subagents: {

     tools: {

       alsoAllow: ["memory_search"]

     }

   }

 }

}

This design embodies the "Defense in Depth" principle: even if a parent agent's prompt fails to restrict a sub-agent's behavior, the system ensures that privileges do not propagate indefinitely.


5.2.7 Acceptance and Regression: Closing the Loop with Evidence

To verify if tool policies are effective, use two types of evidence:

  • Static Evidence: Verify the existence of expected tools.deny and channels.*.groups.*.tools in the configuration.
  • Dynamic Evidence: Check logs for tool allow/deny events and trace them back to the specific policy keys.
  • Operational Example:
  • Bash

openclaw doctor --fix

openclaw status --deep

openclaw logs --follow --json


5.3 Skill Mechanisms: Solidifying Instructions via Built-in Libraries

Tools determine "what can be done," while engineering methods determine "how to do it more stably." Plugins and skills are used in parallel, offering complementary capabilities and methodologies. Plugins extend runtime capabilities and tools, while skills solidify reusable methodologies and operation steps. In new projects, the two typically work together rather than replacing one another.

5.3.1 Positioning and Division: Plugins Extend Capabilities, Skills Solidify Methods

At the system boundary and underlying physical form, responsibilities can be split into two parts:

  1. Capability Layer (Plugins): Plugins are essentially TypeScript extension modules running within the Gateway process, executing code logic in the core resource area. They manifest as new system tools, custom RPC/HTTP interfaces, or the injection of new channels.
  2. Method Layer (Skills): Skills are essentially instruction files (Markdown) + resource packages that the model reads as needed. They do not execute code; instead, they solidify the "operation intent, instruction steps, and acceptance criteria" for common tasks, teaching the model how to invoke existing tools.
  3. In short: If a requirement involves adding external API access or deep control-plane extensions, you must write a TS plugin. If the requirement is to have the LLM execute combinations of existing operations according to a unified team methodology, you only need to write a Markdown skill.
  4. In addition to core tools, the following extended tools/plugins are available:
  • Lobster: A typed workflow runtime supporting resumable approval processes.
  • LLM Task: An LLM step that outputs only JSON, suitable for structured workflow outputs (with optional Schema validation).
  • Diffs: A read-only diff viewer and PNG renderer for before-and-after comparisons.

5.3.2 Plugin Management: Installation, Activation, Whitelisting, and Self-checking

The core of the plugin system is "explicit activation and explicit permission." The configuration structure for plugins.entries and the whitelist mechanism for plugins.allow and plugins.deny are defined in the official documentation.

A common way to enable a plugin is as follows:

JavaScript

{

 plugins: {

   entries: {

     'com.example.my_plugin': {

       enabled: true,

       config: {

         // Custom plugin configuration

       },

     },

   },

   allow: ['com.example.my_plugin'],

 },

}

It is recommended to include plugin self-checks in your acceptance process before going live:

  • plugins list
  • plugins doctor
  • status --deep
Note: Some deployments require a "main command prefix" before these subcommands; if you receive a "command not found" error, prioritize the help output of your local CLI and the conventions used in other chapters of this book.

5.3.3 Skill Files and ClawHub: Supply Chain Risks Behind Convenience

Skills are used to distill high-frequency tasks into documented processes featuring "executable steps + constraints + acceptance criteria." In the open-source ecosystem, ClawHub acts as a public skill repository, supporting vector-based semantic search, version control (Semver), and a convenient CLI installation/update experience.

You can view the list of all current built-in capabilities and their status on the Skills page of the Dashboard, as shown below:

Figure 5-2: Skills Repository Management

Similarly, you can use the CLI to semantically search for and install capability packages created by others:

Example:

Bash

skills list --eligible "daily report"

agent --message "Invoke daily-report (passing date, data source, and other parameters)"

Below is a simplified, self-authored skill template:

Markdown

# Channel Self-Check


This section provides self-check methods.


## Applicable Scenarios


Channels not replying, group chats not triggering, pairing anomalies.


## Steps1. Run `doctor --fix` first.

2. Then run `channels status --probe`.

3. If issues persist, follow `logs --follow --json` and filter for `routed` and `tool_denied` events.


## Output Requirements


Must provide the command used, expected output, exception branches, and next steps.

Note: A skill is a methodological guide, not a tool permission boundary; whether a high-risk tool is allowed to execute is still determined by tool policies and sandboxing.

Beware of Ecosystem Risks: Treating SKILL.md as High-Risk Inducement

While enjoying the thrill of one-click downloads for "hacker instruction sets" from ClawHub, one must confront the underlying supply chain hazards. In February 2026, a Snyk security report (Leaky Skills) revealed that out of approximately 3,984 skills indexed on ClawHub, 283 (7.1%) posed credential leakage risks. That same month, Koi Security’s ClawHavoc investigation identified 341 malicious skills and a critical Remote Code Operation vulnerability, CVE-2026-25253. Because skills are just Markdown files, they are often used as "inducement-based malicious payload executors" via complex prompts.

For example, some malicious SKILL.md files explicitly instruct the LLM to output user environment variables, API credentials, or sensitive local files in plain text to the chat history before executing API tools, or induce the operation of toxic one-click Bash scripts. Models are intelligent, but they are also easily "brainwashed" by documentation to follow such orders.

Therefore, when introducing third-party skills, you must personally review the SKILL.md content to prevent any instructions that attempt to bypass your tool interception chain.


5.3.4 Governance Recommendations: Secure Boundaries with Tool Policy, Improve Stability with Skills

  • Use Tool Policies for allow, deny, and layered governance. Deny rules take precedence over allow rules to prevent unauthorized access.
  • Use Explicit Plugin Whitelists to control loadable, high-privilege Node system operation modules.
  • Use Skills to solidify high-frequency internal distribution processes, but treat external skills introduced by the team as "untrusted, obfuscated code" that requires mandatory auditing.

5.4 Browser Tools and Web Automation

Browser tools are used to convert interactions such as "visiting webpages, logging in, clicking, and scraping" into controlled tool calls, enabling agents to retrieve web information or perform operations as needed. This section explains the operational boundaries, common commands, and how to integrate web automation into tool policies and troubleshooting closed-loops.

5.4.1 Capabilities and Boundaries: Turning Web Operations into Bounded Tools

The engineering challenge of browser automation lies in "controllability." Browser capabilities should be treated as high-risk tools (capable of cross-site access, page reading, and triggering external side effects), with boundaries established at two points:

  1. Tool Policy: Use allowlists, denylists, and layered strategies to control which agents can use browser-related tools. Deny rules take precedence over allow rules.
  2. Interaction Flow: Break web operations into verifiable steps. Each step must have checkable success conditions and output the next troubleshooting command upon failure.

5.4.2 Browser Capability Layering: Four Progressive Levels

Not all web interactions require launching a full browser. In practice, web-related capabilities can be divided into four progressive levels based on cost, complexity, and application scenarios. Always prioritize lower levels and only upgrade to higher levels when necessary.

Detailed Breakdown of the Four Levels

LevelCapabilityApplication ScenarioDependenciesPerformance & Cost
L0Search Engine + Web ScrapingDaily info retrieval (covers 80% of cases)Brave Search + Readability / jina.aiLowest
L1Headless BrowserSPA pages requiring JavaScript renderingHeadless ChromeLow
L2Headful Browser + DOM OpsRequires login, filling forms, clicking buttonsChrome + Virtual Desktop (Xvfb)Medium (Requires ≥4GB RAM)
L3Screenshot + Visual RecognitionInfo exists only in images (product pics, charts)Headful Browser + Multimodal LLMHighest (Slowest speed)

Decision Logic

  1. Try L0 First: If information can be obtained via search + scraping, do not launch a browser.
  2. L0 Scraping is Blank: (Typical symptom: SPA pages showing only skeleton HTML) → Upgrade to L1.
  3. L1 Still Fails: (Requires login, clicking, or form entry) → Upgrade to L2.
  4. L2 Cannot Retrieve Info: (Text embedded in images or chart data) → Finally use L3.
[!NOTE] To use L2 and L3 on a cloud server, you must first install a virtual desktop service (such as Xvfb), which simulates a display in memory. Complete installation command: sudo apt-get install -y xvfb chromium-browser fonts-noto-cjk. Start command: Xvfb :99 -screen 0 1280x1024x24 &, and set the environment variable export DISPLAY=:99.

5.4.3 Common Commands: Start, Check, and Open Pages

The following commands can be used to manage the browser service and verify the environment:

Before requesting browser capabilities, it is recommended to check the status first and start as needed:

Bash

openclaw browser status

openclaw browser start

Once the browser service is available, use browser open to quickly open a target page to verify network and environmental health:

Bash

openclaw browser open "https://example.com"

To stop the browser service, use browser stop:

Bash

openclaw browser stop

5.4.4 Collaboration with Web Tools: Read Tools First, Browser Only if Necessary

Web-related capabilities generally fall into two categories:

  1. Read-Oriented: Web fetching and parsing. Best handled by standard web tools as results are more structured and easier to back-inject.
  2. Interaction-Required: Pages behind logins, complex forms, or pages requiring custom script operation. This is where browser tools are introduced.
  3. From an engineering perspective, prioritize "read tools" to reduce the failure surface caused by the uncertainty of interactive sessions.

5.4.5 Acceptance and Troubleshooting: Using Status and Logs

Browser-related issues should be resolved by narrowing down through these levels:

  1. Use browser status to determine if the service is online.
  2. Use browser open to verify network and page reachability.
  3. Use system logs to determine if the issue is related to tool policies, routing, or sessions.
  4. Bash

openclaw browser status

openclaw status --deep

openclaw logs --follow --json

If the "page opens but the agent cannot complete the task," first check if tool policies have denied the browser tools. if the "browser fails to start," check dependencies and the running environment, and run doctor to get structured self-check results:

Bash

openclaw doctor


5.5 Chapter Summary

The core of Chapter 5 is incorporating OpenClaw's action capabilities into engineering governance: tools and extended capabilities must be constrained by policy and possess verifiable acceptance and troubleshooting paths.

5.5.1 Key Conclusions

  1. The security boundaries of tool calls should be secured by runtime policies and cannot rely solely on prompt constraints.
  2. Tool governance should start with a "default-deny" minimum privilege model, using allow, deny, and layered policies to converge side-effect capabilities into controlled entry points.
  3. The plugin system provides distributable extended capabilities and should be coupled with whitelists and explicit activation switches; the skill system is used to solidify procedural methodologies but should not carry runtime security boundaries.
  4. Browser tools are high-risk capabilities; they should be integrated into tool policies and establish a troubleshooting closed-loop using status commands and structured logs.

5.5.2 Reader Self-Check

  • Can you use status --deep and log replays to explain why a tool was denied or failed to execute?
  • Do you have a set of minimal test cases to verify that "what should be allowed is allowed, and what should be denied is denied"?
  • Are you able to quickly deactivate extended capabilities and roll back to a secure state when an anomaly occurs?

5.5.3 Community Practical Inspiration

Beyond technical testing, you can try applying tool capabilities to these real-life scenarios:

  • Daily Information Distillation: Utilize browser automation tools to periodically open specific tech news sites or video platforms, extract the latest trending topics, and generate summaries.
  • Personal Health Tracking: Encapsulate structured recording instructions into a dedicated Skill, allowing the Agent to act as your diet and exercise log coordinator.
  • Automated Market Benchmarking: Develop sequential web operation flows to automatically search for similar product information and scrape public data, assisting in preliminary market research and analysis.

5.5.4 Next Chapter Preview

Chapter 6 enters the realm of sessions, context, and memory, with the goal of turning task continuity into a controllable capability: knowing what the system remembers, why it remembers it, and how to compress or prune that data.


Chapter 6: Sessions, Context, and Memory

This chapter transforms an agent’s ability from "being able to chat" into "being able to steadily advance tasks." Sessions define state ownership, Context organizes available information within Token budgets, and Memory facilitates the long-term accumulation of facts and preferences across sessions. Together, these three elements determine a system’s reproducibility, observability, and maintainability. Through this chapter, you will learn how to maintain long-term, predictable, and reproducible conversational capabilities for agents under finite resource constraints.

Chapter Guide

This chapter consists of the following sections:

  • 6.1 Session Models and State Persistence: Understanding session identification, lifecycles, and persistence boundaries.
  • 6.2 Context Construction and Window Budgets: Mastering context construction strategies: selection, pruning, window budgeting, and injection order.
  • 6.3 Memory Mechanisms: Writing, Retrieval, and Expiry: Exploring memory mechanisms: the synergy between primary storage (Markdown) and retrieval backends.
  • 6.4 Compaction and Pruning: Folding and Discarding Strategies: Managing long sessions: trigger conditions and trade-offs for compaction and pruning.
  • 6.5 Chapter Summary: Key conclusions and reader self-assessment.

Learning Objectives

After completing this chapter, you will be able to:

  1. Design Sessions: Create logical session isolation and identification strategies for various scenarios.
  2. Organize Context: Effectively organize and inject contextual information within Token budget constraints.
  3. Establish Memory: Design long-term memory mechanisms that allow agents to retain critical information.
  4. Manage Growth: Keep long-term sessions efficient through compaction and pruning techniques.

6.1 Session Models and State Persistence

This section introduces OpenClaw's session management mechanism, which consists of three core components: defining session scopes, setting reset strategies, and ensuring state persistence and troubleshooting. Through proper configuration, developers can transform risks like "cross-talk," "duplicate operation," and "state recovery failure" into configurable and observable engineering practices.

6.1.1 Session Scopes: Define Your Session Keys First

OpenClaw's session behavior is controlled by the global session configuration. The most critical "knob" is session.scope, which defines "which messages are folded into the same session." For official examples and field explanations, see: Session Configuration.

Typical Selection Logic:

  • One session per sender: Ideal for private chats and personal assistants.
  • One session per thread or topic: Suitable for channels that support threading or scenarios requiring strict isolation between different topics.
  • Linking multiple identities to one session identity: Perfect for a user using the same agent across Telegram and Discord.
  • You can intuitively manage active sessions and their Token consumption via the Sessions page on the Dashboard, as shown below:
  • Figure 6-1: Sessions Management and Usage
  • Before diving into the configuration examples, please refer to the following session terminology to avoid confusion:
ConceptConfiguration FieldDefaultAliases or Supplementary Notes
DM Merge Keysession.dmScopemainOptions: main (all DMs merged into one main session), per-peer (isolated by peer), per-channel-peer (isolated by channel + peer), per-account-channel-peer.
Reset Strategysession.resetN/ASupports mode (e.g., daily, idle), atHour, idleMinutes; manual reset commands can be set via session.resetTriggers.
Identity Bindingsession.identityLinks{}Used for cross-channel binding, e.g., merging User A on TG with User A on Discord.

Configuration Example (Adapted from official docs to highlight key fields):

JavaScript

{

 session: {

   scope: "per-sender",


   // Fold DM sessions into agent:<agentId>:<mainKey> to merge multiple DMs into a "main session."dmScope: "main",

   mainKey: "main",


   // Link multiple channel identities as the same "person" to prevent fragmented cross-channel dialogue.identityLinks: {

     alice: ["telegram:123456789", "discord:987654321012345678"],

   },

 },

}

Check-off Point: You can explain which sessionKey a message will eventually land in and find the corresponding records in the logs and session storage.


6.1.2 Reset Strategies: Rule-Based Resets, Not Manual Clears

After running for a long time, sessions accumulate history and context drift. OpenClaw provides configurations for resets based on time or idle duration, supporting different settings per session type (e.g., longer for DMs, shorter for group chats).

Configuration Example:

JavaScript

{

 session: {

   // Global reset strategyreset: {

     mode: "daily",        // Options: "daily", "idle", etc.atHour: 4,            // Hour to reset daily (local timezone)idleMinutes: 60,      // Reset after being idle for this duration

   },

   // Manual reset commandsresetTriggers: ["/new", "/reset"],

 },

}

[!NOTE] Session reset strategies are configured globally via session.reset. For differentiated resets by session type (DM, group, thread), refer to the specific sub-field descriptions under session in the official documentation.

6.1.3 Message Queue Modes: Official Modes and Defaults

Beyond reset strategies, how messages queue within a session affects the user experience. OpenClaw's official queue modes include collect, steer, followup, steer-backlog, and interrupt. The default value is collect.

[!WARNING] The queue mode commonly seen in earlier versions is merely a legacy alias for steer, not the recommended default mode for queuing. Avoid using queue as a mode value in new configurations.

Main Official Queue Modes:

  • collect (Default): While the agent is processing the current message, new messages are collected and queued, then processed sequentially. Pros: Simple logic, won't interrupt ongoing tasks.
  • steer: New messages are injected in real-time into the agent's current processing context, allowing the agent to adjust its direction immediately. Ideal for human-AI collaboration and continuous course correction.
  • followup / steer-backlog / interrupt: These correspond to finer concurrency control semantics like appending, backlog-guided steering, or hard interrupts (refer to official docs for specifics).
  • Example Queuing Configuration:
  • JavaScript

// ~/.openclaw/openclaw.json

{

 messages: {

   queue: {

     mode: "collect",      // Official defaultdebounceMs: 1000,

     cap: 20,

     drop: "summarize",

     byChannel: {

       telegram: "collect",

       discord: "collect",

     },

   },

 },

}

[!TIP] If you frequently need to change directions while the agent is executing (e.g., "Stop searching that, try this keyword instead"), it is recommended to enable steer mode for that specific channel. For independent long tasks where interruptions are unwanted, keep the default collect mode.

6.1.4 DM Session Isolation and Multi-User Mode (Security Perspective)

When OpenClaw acts as a shared assistant bot (e.g., opening DMs to multiple users on Telegram or WhatsApp), understanding the security of DM isolation is vital:

  • Isolation applies to requests and memory: By default, each DM session (e.g., agent:main:telegram:dm:user_A and agent:main:telegram:dm:user_B) maintains its own independent Token window, history, and temporary context. User A cannot see the private history between User B and the bot.
  • However, host resources are shared: Since the same Gateway and Agent face the same underlying host environment (file system, Shell), if Tool Policies do not sandbox permissions, User A could potentially "read host files" to peek at global system info or temporary files written by User B.
Secure DM Mode Recommendation: If serving multiple untrusted users, you must prohibit the use of high-risk tools like group:runtime and restrict file I/O to strict sandboxes or independent volumes. See the Official Security Documentation.

6.1.5 Dual-Layer Session State Storage

By default, OpenClaw session data is stored per agent in the ~/.openclaw/agents/<agentId>/sessions/ directory. You can override this via session.store. The design separates state into two layers:

  • Session Store (sessions.json): Stores session metadata (e.g., sessionId, Token counts, compaction frequency). This data is lightweight; even if accidentally modified, the Gateway can safely rebuild it.
  • Transcript (<sessionId>.jsonl): An append-only dialogue history (JSONL format) containing raw messages, tool call records, and summarized content from compactions.
  • Configuration Example:
  • JavaScript

{

 session: {

   store: "~/.openclaw/agents/{agentId}/sessions/sessions.json",

 },

}

Recommendation: Include session storage directories in backups and audits, but do not sync sensitive content to untrusted locations. Enable masking or minimal logging where necessary.

[!TIP] Trap: Does the /new command cause memory loss? Many beginners assume /new makes the AI "forget" everything. In reality, this command only creates a new sessionId and clears the temporary dialogue context (Transcript). Persistent memory files on disk (like MEMORY.md) remain intact and will be automatically reloaded in the new session.

6.1.6 Troubleshooting Case: Cross-Channel Cross-Talk

The Scenario: User A asks a question on Telegram but receives User B's context from WhatsApp.

An operator receives a report: "I asked about deployment, but the bot replied with financial statement info." The troubleshooting steps:

  1. Locate the Session Key: Filter logs for User A's request; find the sessionKey as agent:main:telegram:dm:alice_tg.
  2. Check Identity Links: Discover that the identityLinks config incorrectly linked User A's Telegram ID and User B's WhatsApp ID to the same identity:
  3. JavaScript

{

 session: {

   identityLinks: {

     // Error! alice and bob mistakenly bound to the same identityalice: ["telegram:alice_tg_id", "whatsapp:bob_wa_id"],

   },

 },

}

3.  **Root Cause Confirmation:** Due to the link error, User B's WhatsApp history was injected into User A's context.

4.  **Fix:** Correct the `identityLinks` to ensure only IDs belonging to the same person are under one identity. Restart and verify with `status --deep`.


**Lesson:** `identityLinks` is the "identity merge switch." A misconfiguration can lead to privacy-leaking cross-talk. It should be on the change-audit checklist.


---


## 6.1.7 Troubleshooting Commands: Locating Anomalies via Status and Logs


When issues like "cross-talk," "sudden context loss," or "failure to reset" occur, prioritize system self-checks and structured logs.


```bash

# View overall status (--deep performs a more thorough probe)

openclaw status --deep


# Trace logs, add --json for easy filtering with jq

openclaw logs --follow --json

Operation Example: Filter the event stream for a specific session key in the logs to see if a session is being written to by multiple identities. (Field names depend on actual logs).

Bash

cat runtime.log | jq -r 'select(.type=="log") | .log | select(.sessionKey=="agent:main:whatsapp:dm:+15555550123") | [.ts,.trace_id,.event,.from] | @tsv' | tail


6.2 Context Construction and Window Budgets

This section discusses context budgets based on OpenClaw's actual mechanisms: context is composed of workspaces, skills, session history, and tool receipts. When tool outputs accumulate, they must be pruned according to specific rules before being sent to the model, rather than modifying the history on disk. The focus lies on the behavior and tuning methods of agents.defaults.contextPruning, and how to use replays and metrics to verify that pruning hasn't compromised critical decision-making.

6.2.1 Context Composition: Workspace, Skills, Sessions, and Tool Receipts

In OpenClaw, the workspace is the primary source of context. The official memory mechanism defines the purposes of several key files: system prompts, skills, workspace instructions, long-term memory, and daily logs.

The engineering challenge of context is not "whether information exists," but "whether information is injected in a usable form." A typical anti-pattern is leaving massive amounts of raw tool output in the session indefinitely, leading to spiraling costs, latency, and attention dilution.


6.2.2 Tool Result Pruning: agents.defaults.contextPruning

The official agents.defaults.contextPruning feature is used to prune old tool results before a request is sent to the model. A key point: it only changes the "context sent to the model" and does not modify the session history on disk, facilitating easy replay and auditing.

[!IMPORTANT] Session Pruning is currently only effective when mode: "cache-ttl" is used with the Anthropic API (including Anthropic models via OpenRouter). Its core purpose is to prune old tool results to reduce re-caching costs after a session has been idle longer than the prompt cache TTL. If using OpenAI or other providers, this feature is currently not applicable.

Two Pruning Methods:

  • Soft-trim: Retains the head and tail of oversized tool results, inserting ... in the middle while noting the original size. Results containing image blocks are skipped.
  • Hard-clear: Replaces older tool results with a placeholder (hardClear.placeholder).
  • Configuration Example (Reference defaults when enabled; refer to official docs for specifics):
  • JavaScript

{

 agents: {

   defaults: {

     contextPruning: {

       mode: "cache-ttl",

       keepLastAssistants: 3,

       softTrimRatio: 0.3,

       hardClearRatio: 0.5,

       minPrunableToolChars: 50000,

       softTrim: { maxChars: 4000, headChars: 1500, tailChars: 1500 },

       hardClear: { enabled: true, placeholder: "[Old tool result content cleared]" },

       tools: { deny: ["browser", "canvas"] },

     },

   },

 },

}

Check-off Point: In long sessions, the model input volume is controlled and stable; when replaying the same trace, key decisions do not drift inexplicably due to pruning.


6.2.3 Tuning Methodology: Protect Decision Evidence, Then Prune Noise

When tuning, it is recommended to follow this sequence:

  1. Improve tool re-injection structure: Re-inject key fields and summaries into the session, while saving full outputs to disk as evidence references.
  2. Adjust contextPruning thresholds and exclude lists: Avoid pruning the "evidence segments" that are actually required.
  3. When a "tool was called but the model ignored it" issue occurs, check if the tool re-injection was hard-cleared before attempting to modify the prompt.

6.2.4 Verification Commands: Validating Pruning Effects via Status and Replay

Operation Example: Observe system status and model-side error distributions to confirm that pruning has not introduced abnormal retries or formatting errors.

Bash

# Check overall status

openclaw status --all

Operation Example: Count the frequency of pruning-related events or placeholders to determine if over-pruning is occurring. (Field names depend on actual implementation).

Bash

cat runtime.log | rg "Old tool result content cleared" | wc -l

6.3 Memory Mechanisms: Writing, Retrieval, and Expiry

This section explains the official memory system: "where memory resides, how it is retrieved, and how to prevent pollution." OpenClaw’s long-term memory centers on workspace files, supported by vector indices and built-in memory tools (e.g., memory_search, memory_get). Mastering these mechanisms is essential to transforming memory from "accumulated clutter" into a "maintainable asset."

6.3.1 Dual-Layer Memory Structure: MEMORY.md and Daily Logs

Based on official design, OpenClaw’s memory follows the "files are the source of truth" philosophy. Stored within the workspace, it consists of two primary layers:

  • Long-term Memory (MEMORY.md): Contains curated persistent preferences, configuration decisions, and crystallized experiences. It is located in the workspace root.Security and Privacy Boundaries: According to official documentation, MEMORY.md is loaded only in private main sessions (direct chat) and is never injected into group chats, thereby protecting user privacy.
  • Daily Logs (memory/YYYY-MM-DD.md): Contains milestone project progress and detailed daily discussions. Upon session startup, the system defaults to reading data from "today and yesterday" to maintain short-term continuity.
  • The engineering significance of this design is the separation of "reusable facts" from "procedural noise," ensuring a clean signal-to-noise ratio for subsequent retrieval and context injection.

6.3.2 Writing Rules and Timing: Record Only Valuable Facts

The official best practices for "when to write to memory" are as follows:

  • Write to MEMORY.md: Record high-value user preferences, major decisions, or stable state configurations.
  • Write to memory/YYYY-MM-DD.md: Record frequent but transitional development operations and daily progress logs.
  • Immediate Write: Regardless of the file, whenever a user explicitly says "remember this," persist that entry immediately.
  • Furthermore, the most common failure in memory writing is treating speculation as fact. We recommend converging writing rules into these hard constraints:
  • Stable: Reusable across sessions and not prone to short-term expiration.
  • Traceable: Must originate from explicit tool receipts or confirmed evidence, not LLM inference or guessing.
  • Correctable: Allow for revocation, updates, or replacement at any time; refuse to permanently solidify erroneous information.
  • Practical Anti-Patterns: Bad vs. Good Memory
  • ❌ Bad Memory (Subjective speculation or mood logs): "The user seems to be in a bad mood today and encountered a hard-to-trace Node.js OOM bug." This text expires tomorrow and wastes tokens.
  • ✅ Good Memory (Objective, traceable facts and parameters): "User prefers Python for workflow scripts. Current production Kubernetes cluster is prod-cluster-us, requiring a specific service account for O&M (Source: Feb 20 session)."
  • Operational Example: Use structured sections in MEMORY.md to record facts with sources and timestamps; use memory/YYYY-MM-DD.md for procedural logs.
  • Markdown

## Deployment Regions- Conclusion: Production environment deployed in us-east-1

- Source: Change Order CHG-12345

- Updated: YYYY-MM-DD


6.3.3 Retrieval Mechanism: Hybrid Vector Search and Precise Reading

To extract data from the fragments (both MEMORY.md and memory/**/*.md), official memory tools provide two main methods:

  • Memory Search: memory_search — Defaults to a hybrid search algorithm (BM25 + Vector Similarity). Data is chunked (400 tokens per chunk with small overlaps). It returns query snippets with detailed file paths and line numbers.
  • Precise Reading: memory_get — Reads content based on existing evidence, accurately hitting specific lines to prevent pollution.
  • Common Configuration Trap: Embedding API Key Dependency
  • The vector retrieval backend for memory_search requires an independent embedding API key (OpenAI, Gemini, or Voyage). Even if the main chat model is Claude, an additional embedding provider must be configured. If only the Anthropic key is set without an embedding provider, memory_search will silently fail—it won't throw an error, but it won't return results, making the memory feature appear non-existent. Ensure embedding API keys are correctly filled in openclaw.json and verify via logs that the vector index is building correctly.
  • Optimal Configuration: You can tune search weights via the agents.defaults.memorySearch JSON. Testing shows that assigning a slightly higher weight to vectors helps align with business logic:
  • JavaScript

{

 agents: {

   defaults: {

     memorySearch: {

       query: {

         hybrid: {

           enabled: true,

           vectorWeight: 0.7,

           textWeight: 0.3

         }

       }

     }

   }

 }

}

Key Point: Retrieval results should always follow the logic of "quality over quantity." Flooding the context with massive candidates only causes the model to "blindly prioritize noise."


6.3.4 Index Construction Pipeline: From File Save to Searchable

Hybrid retrieval requires a ready index. OpenClaw’s indexing pipeline runs automatically after files hit the disk, but understanding the internal flow helps troubleshoot "file edited but not searchable" issues.

Listening and Debouncing

The system uses Chokidar to monitor memory files (MEMORY.md and memory/*.md) in the workspace in real-time. Whether written by the agent or edited by the user, saving the file triggers the indexing process. To avoid redundant builds from high-frequency writes, the listener uses a 1.5-second debounce delay—indexing only starts after 1.5 seconds of silence following the last save.

Chunking Strategy

Before indexing, the system chunks file content into units of approximately 400 tokens, with an 80-token overlap between adjacent blocks. The goal of overlap is to prevent critical semantics from being severed—for instance, a decision description spanning a boundary can be fully matched in either adjacent block.

Chunk 1: Lines 1-15  ──┐

                      ├─ 80-token overlap

Chunk 2: Lines 12-28 ──┘──┐

                         ├─ 80-token overlap

Chunk 3: Lines 25-40 ─────┘

Vector Generation and Storage

Each text block is sent to the embedding model (default: text-embedding-3-small, generating 1536-dimensional vectors), and the results are stored in a local SQLite database containing four core tables:

TableResponsibility
chunksStores raw chunk text, file paths, and line ranges.
embeddingsStores the 1536-dimensional vector for each chunk.
fts (Full-Text Search)Stores inverted indices for BM25 keyword retrieval.
vector_cacheMaps text hashes to vectors to skip duplicate embedding calls.

The vector_cache table is noteworthy: when file content is unchanged, the system uses hash comparisons to skip duplicate API calls, saving costs and accelerating index builds.

Check-off Point: After modifying a memory file, wait ~2 seconds and use memory_search for a newly written keyword. If it hits, the pipeline is working. If it consistently fails, check the embedding API key (see 6.3.3).


6.3.5 Expiry and Cleanup: Giving Memory a Lifecycle

In long-term systems, facts will inevitably expire. It is recommended to add "Source/Updated/Expiry" fields to every memory entry and perform periodic reviews:

  • Mark expired entries as invalid or migrate them to daily logs.
  • Maintain a chain of change when new facts override old ones.
  • Explicitly tag conflicting facts to prevent the model from choosing arbitrarily.
💡 Real-world Trap: The Mystery of "Silent Failure" in Memory SearchYou set up your Anthropic API Key and think everything is fine, yet memory_search never returns results. After hours of debugging, you realize: vector retrieval for memory search requires an independent embedding API key (OpenAI, Gemini, or Voyage), which is distinct from the chat model key. The most frustrating part is that it doesn't error out; it just silently returns empty results. Use openclaw doctor after setup to verify the embedding provider status.

6.4 Compaction and Pruning: Folding and Discarding Strategies

This section breaks down "context explosion" into two configurable mechanisms: Tool Result Pruning and Session Compaction. The former is controlled by agents.defaults.contextPruning, aiming to discard old tool results to reduce model input volume. The latter is controlled by agents.defaults.compaction, aiming to generate summaries when a session nears its threshold and flush long-term memory if necessary. Together, they ensure both usability and reproducibility in long-running sessions.

6.4.1 Distinguishing Core Concepts: Compaction vs. Pruning

When dealing with context explosion, beginners often confuse these two OpenClaw schemes. Their fundamental differences lie in "persistence" and "lifecycle triggers":

  • Session Pruning (controlled by contextPruning): Occurs before the LLM call. The system "temporarily discards extremely old tool operation prints" to reduce per-request load, lower compute costs, or mitigate cache misses. It does not modify any disk history files and only acts on in-memory data structures.
  • Session Compaction (controlled by compaction): Occurs when the entire session nears its capacity limit. It breaks down and folds long, redundant dialogue into concise summary points (abstracts) and writes this back to your JSONL transcript, performing a physical fold so that new data can continue to roll smoothly.
[!IMPORTANT] Session Pruning is currently only effective when mode: "cache-ttl" is used with specific drivers like the Anthropic API. Compaction, however, is a long-term, independent safety valve that benefits every model. While they complement each other, they do not depend on one another.

6.4.2 Tool Result Pruning: Parameters and Internal Mechanisms

Understanding the full parameters of pruning helps with fine-tuning in production—such as adjusting how many recent assistant replies to protect or at what volume threshold to trigger pruning.

ParameterDefault ValueDescription
mode"cache-ttl"Pruning mode.
ttlMs300,000 (5 mins)Cache TTL.
keepLastAssistants3Protects the last N assistant replies from pruning.
softTrimRatio0.3Triggers soft-trim when context reaches 30%.
hardClearRatio0.5Triggers hard-clear when context reaches 50%.
softTrim.maxChars4,000Only prune tool results exceeding this length.
softTrim.headChars1,500Characters to keep at the head during soft-trim.
softTrim.tailChars1,500Characters to keep at the tail during soft-trim.
minPrunableToolChars50,000Minimum tool result size for hard-clear.

Pruning is executed in two stages: the first stage (soft-trim) truncates long tool results into "Head 1500 + ... + Tail 1500"; the second stage (hard-clear) replaces the entire result with the placeholder [Old tool result content cleared].

Three Key Safety Constraints (Hard-coded rules in source):

  • Image Protection: Tool results containing image blocks are never pruned, as image content cannot be recovered via head/tail truncation.
  • Instruction Protection: All content before the first user message (identity reads like SOUL.md, USER.md) is excluded from pruning.
  • Tool Filtering: You can specify which tool results are excluded from pruning using glob patterns via tools.deny / tools.allow.
  • Token Estimation: The source uses a conservative estimate of CHARS_PER_TOKEN_ESTIMATE = 4. Images are budgeted at IMAGE_CHAR_ESTIMATE = 8000 characters (approx. 2000 tokens).

6.4.3 Identifier Protection and Safety Filtering in Compaction

Compaction essentially asks the LLM to summarize history. However, LLMs have a dangerous tendency to silently alter opaque identifiers—shortening UUIDs, omitting parts of API Keys, or dropping URL parameters. This leads to "ghost errors" in subsequent tool calls.

The source code (compaction.ts) includes built-in Identifier Preservation Instructions, injected into every summary request:

"Preserve all opaque identifiers exactly as written (no shortening or reconstruction), including UUIDs, hashes, IDs, tokens, API keys, hostnames, IPs, ports, URLs, and file names."

Batch Summary Strategy: If a dialogue is too long for a single summary, the system splits messages by token share (default into 2 segments), summarizes each independently, and then merges them. The merge phase specifically requires preserving active task states, batch progress (e.g., "5/17 completed"), the last user request, decision rationale, and to-do items.

Safety Filtering: Before compaction, stripToolResultDetails() strips the details field from all tool results to prevent untrusted or verbose payloads (like stderr or HTTP headers) from entering the summary prompt, saving tokens and avoiding prompt injection.


6.4.4 Pre-Compaction Memory Flush Mechanism

Before executing compaction, OpenClaw attempts to let the agent proactively save critical info. When the system detects context approaching the soft threshold (softThresholdTokens), it inserts a silent agent turn to allow the model to archive important content to persistent storage.

The 6-Step Pre-Compaction Process:

  1. Monitoring: Real-time tracking of token consumption.
  2. Soft Alert: Starts the process once the softThresholdTokens line is crossed.
  3. Instruction Injection: Sends an internal command: "Context is about to be compacted; save key info."
  4. Archiving: The agent writes critical context to persistent files (e.g., memory/YYYY-MM-DD.md).
  5. Silent Confirmation: The agent ends with a NO_REPLY string, which is hidden from the user.
  6. Formal Compaction: Once memory is safely stored, the system performs regular compaction and folding.
  7. Configuration Example:
  8. JavaScript

{

 agents: {

   defaults: {

     compaction: {

       reserveTokensFloor: 20000,

       memoryFlush: {

         enabled: true,

         softThresholdTokens: 4000,

         prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."

       }

     }

   }

 }

}

6.4.5 Troubleshooting: Status, Logs, and Pruning Events

Operation Example: Use the status command to confirm system availability, then observe logs to see if pruning and compaction are triggered too frequently during peak periods.

Bash

openclaw status --deep

openclaw logs --follow --json

Operation Example: Count how often tool results are being trimmed (Log fields depend on actual implementation).

Bash

cat runtime.log | rg "Tool result trimmed" | wc -l


6.5 Chapter Summary

Chapter 6 brings the issue of "conversations becoming unstable as they grow" back from the realm of model capability into the realm of engineering control: Session Keys determine state ownership, Context Pruning regulates input volume, Long-term Memory files carry stable facts, and Session Compaction ensures long conversations continue to progress while remaining easy to replay and audit.

6.5.1 Key Conclusions

  • Segment Sessions Before Tuning: Scopes, identity links, and reset strategies determine whether conversations suffer from "cross-talk," "context gaps," or "recovery failures."
  • Keep Context Controllable: Use structured tool re-injection alongside context pruning to prevent costs and latency from spiraling linearly as the dialogue continues.
  • Ensure Memory is Maintainable: Record only stable facts with sources and timestamps; relegate procedural noise to daily logs or evidence files.
  • Make Compaction Verifiable: Compaction and pruning should only affect model input and must not corrupt on-disk history; troubleshooting should allow for tracing the replay chain.

6.5.2 Reader Self-Check

  • [ ] Can you explain which session key a specific message lands in and when a reset will occur?
  • [ ] Is the input volume in long sessions controllable, and can pruning/compaction events be reconciled in the logs?
  • [ ] Are memory files maintained according to rules (source, timestamp, revocability) and successfully hit during retrieval?
  • [ ] Can tasks continue to progress after compaction is triggered, and is the replay chain reproducible?

6.5.3 Community Practical Inspiration

Having completed this chapter, your agent now possesses "photographic memory." You might try the following practices:

  • "Second Brain" & Personal CRM: Use long-term state persistence to let your agent remember the details, preferences, and to-dos of everyone you speak with, acting as a digital companion.
  • Private Knowledge Base: Inject long-form text or professional domain documents into the system, utilizing memory mechanisms to build a smart Q&A assistant tailored to your personal knowledge boundaries.
  • Semantic History Retrieval: Use built-in semantic search to instantly locate an inspiration or code snippet discussed weeks ago, spanning hundreds of dialogue records.

Chapter 7: Multi-Channel Distribution & Multi-Agent Collaboration

When access points expand from a single channel to multi-channel, multi-group, and multi-endpoint environments, the most common issue is not a lack of model capability, but rather blurred boundaries: Who is responsible for processing? Which entry points are allowed to trigger? Which capabilities are high-risk? How do we playback and locate issues when they occur?

The core objective of this chapter is to transform multi-channel and multi-agent operations into a manageable system:

  • Channel Strategies converge the trigger surface.
  • Binding and Routing converge ownership.
  • Tool Policies and Sandbox Constraints converge the operation boundaries.

Chapter Roadmap

This chapter includes the following sections:

  • 7.1 Channel in Practice: Taking Over Telegram and WhatsApp Demonstrates the complete integration workflow—from credential configuration to message transceiving—using two mainstream channels as examples.
  • 7.2 Channel Binding and Multi-Account Isolation Explains how to bind channel entry points to specific agents and discusses isolation strategies for multi-account scenarios.
  • 7.3 Specialized Guide for Lark (Feishu) Integration Focuses on the integration details and common adaptation challenges specific to the Lark platform.
  • 7.4 Routing Fundamentals: From Single-Agent to Multi-Agent Introduces routing configurations and binding mechanisms to turn "who takes over" into a deterministic boundary.
  • 7.5 Collaboration Patterns: Sub-Agents and Broadcast Groups Covers parallel task decomposition by sub-agents and delivery mechanisms for broadcast groups, including handover protocol templates.
  • 7.6 Chapter Summary Key conclusions and a self-check list for readers.

7.1 Channel in Practice: Taking Over Telegram and WhatsApp

This section explains the integration models, configuration structures, and security boundaries for Telegram and WhatsApp based on official access methods. it provides a landing path from "functional" to "controllable": first using Gating Strategies to converge the trigger surface, then using Probes and Logs to verify connectivity, and finally using Binding and Tool Policies to fix high-risk capabilities within controlled entry points.

7.1.1 Telegram: Bot Tokens and PM/Group Strategies

Telegram channels are typically integrated via Bot Tokens. The official documentation defines the channels.telegram configuration fields, as well as entry points for Private Message (DM) and Group Chat strategies. The dmPolicy defaults to pairing (consistent with WhatsApp) and supports four options: pairing, allowlist, open, and disabled. It is recommended to start securely using allowlists and mention gating.

Example Configuration:

JavaScript

{

 channels: {

   telegram: {

     botToken: '${TELEGRAM_BOT_TOKEN}',

     dmPolicy: 'allowlist',

     allowFrom: ['tg:987654321'],

     groupPolicy: 'allowlist',

     groupAllowFrom: ['tg:987654321'],

     groups: { '*': { requireMention: true } },

   },

 },

 messages: {

   groupChat: {

     mentionPatterns: ['@openclaw'],

   },

 },

}

After configuration, it is recommended to verify connectivity with a channel probe before moving to Routing and Tool layer governance:

Bash

openclaw channels status --probe

7.1.2 WhatsApp: Running on Your Own Account, Prioritizing Surface Convergence

The WhatsApp channel runs on an actual account, meaning the primary risk is a naturally larger trigger surface. The official documentation provides dmPolicy, groupPolicy, and pairing workflows. It is recommended to converge entry points through pairing, allowlists, and mention gating.

Official recommendations include two deployment modes:

  • Dedicated Number (Recommended): Use a separate WhatsApp identity for OpenClaw. DM strategies are clearer, and it avoids "self-chat" confusion.
  • Personal Number Fallback: Set selfChatMode to true and add your personal number to allowFrom to enable self-chat mode.
  • WhatsApp also supports multi-account access, allowing you to log in to different accounts via the --account work parameter:
  • Bash

openclaw channels login --channel whatsapp --account work

Example Configuration:

JavaScript

{

 channels: {

   whatsapp: {

     dmPolicy: 'pairing',

     allowFrom: ['+15555550123'],

     groupPolicy: 'allowlist',

     groupAllowFrom: ['+15555550123'],

     groups: { '*': { requireMention: true } },

   },

 },

 messages: {

   groupChat: {

     mentionPatterns: ['@openclaw'],

   },

 },

}

The pairing process should be included in the O&M (Operations) acceptance:

Bash

openclaw pairing list whatsapp

openclaw pairing approve whatsapp <CODE> --notify


7.1.3 From Channels to Routing: Fixing High-Certainty Sources via Binding

Channel strategies handle entry gating and default takeovers. However, for administrator accounts, critical groups, or fixed business numbers, it is recommended to use Binding to fix the takeover agent. This reduces uncertainty in model-based routing.

Verify whether a binding is effective using commands rather than just observing the conversation:

Bash

openclaw agents list --bindings

7.1.4 Acceptance and Troubleshooting: Probe First, Replay Later

The recommended troubleshooting sequence for channel issues is:

  1. Self-Check & Status: Use doctor and status --deep to confirm dependencies and configurations are loaded.
  2. Channel Probe: Use channels status --probe to confirm the channel is online.
  3. Link Replay: Use structured logs to replay a request chain by traceId.
  4. Bash

openclaw doctor

openclaw status --deep

openclaw channels status --probe

openclaw logs --follow --json

Pro Tip: When "Group chat does not trigger," first check groups.*.requireMention and messages.groupChat.mentionPatterns. When "Unauthorized operation" occurs, first check if the Tool Policy is defaulting to "deny" for high-risk tool groups.

7.2 Channel Binding and Multi-Account Isolation

As a system expands from a single entry point to multi-channel, multi-group, and multi-endpoint environments, the greatest risk often stems from boundary drift: low-trust entry points triggering high-privilege capabilities, or different entry points sharing the same policy, making accountability difficult to trace. Based on OpenClaw's official channel policies, multi-account configurations, and binding mechanisms, this section explains how to shift entry point governance to the configuration layer and provides a set of commands for acceptance and troubleshooting.

7.2.1 Entry Point Governance: Separating PM and Group Configurations

The first principle of entry point governance is to separate Private Messages (DMs) from Group Chats. Official channel documentation generally provides dmPolicy and groupPolicy entry points to define allowlists and mention gating respectively:

  • WhatsApp: Reference official docs for policy structures.
  • Telegram: Reference official docs for bot-specific policies.
  • A "cautious start" default is recommended: Group chats should only respond when mentioned by default, while private messages should be restricted to allowlists or pairing approvals.

7.2.2 Multi-Account Isolation: Decoupling External and Internal Entry Points

The value of multi-account support lies not in "running multiple instances," but in isolation: decoupling external support entry points from internal O&M (Operations) entry points, ensuring different entry points possess distinct gating policies. For example, channels.whatsapp.accounts can be used to provide multi-account isolation.

The following example demonstrates a configuration skeleton for two accounts:

JavaScript

{

 channels: {

   whatsapp: {

     accounts: {

       support: {

         dmPolicy: 'pairing',

         allowFrom: ['+15555550123'],

         groupPolicy: 'allowlist',

         groupAllowFrom: ['+15555550123'],

         groups: { '*': { requireMention: true } },

       },

       ops: {

         dmPolicy: 'allowlist',

         allowFrom: ['+15555550999'],

         groupPolicy: 'allowlist',

         groupAllowFrom: ['+15555550999'],

         groups: { '*': { requireMention: true } },

       },

     },

   },

 },

 messages: {

   groupChat: {

     mentionPatterns: ['@openclaw'],

   },

 },

}

Once multi-account deployment is live, it is recommended to treat the account identifier as a first-class dimension in logs and alerts to facilitate auditing and attribution.


7.2.3 Binding: Fixing Takeovers for High-Certainty Sources

While multi-account setups handle entry point isolation, Binding handles precise routing. For administrator endpoints, critical groups, or fixed business numbers, it is recommended to prioritize using bindings to fix the takeover agent, thereby reducing uncertainty in model-based routing.

You can directly verify whether a binding is effective using the following command:

Bash

openclaw agents list --bindings

7.2.4 Acceptance and Troubleshooting: Converging Issues via Probes and Log Replay

Issues related to entry point governance should be resolved layer by layer:

  1. Self-Check: Use doctor to confirm dependencies and configuration structure.
  2. Status: Use channels status --probe to confirm channel online status and policy loading.
  3. Replay: Use structured logs to locate—via traceId—whether a behavioral discrepancy is caused by gating, binding, or tool policies.
  4. Bash

openclaw doctor

openclaw channels status --probe

openclaw status --deep

openclaw logs --follow --json

Quick Fix: If "Group chat triggers incorrectly" occurs, first check mention gating and allowlists; if "Unauthorized operation" occurs, first check if the tool policy is defaulting to "deny" for high-risk tool groups.

7.3 Lark (Feishu) Specialized Integration Guide: Chatting in Groups

Connecting Lark requires not only configuring OpenClaw but also completing a series of authorizations on the Lark Open Platform. Many beginners fail at the first step: "Long-Connection Subscription." This section outlines an error-proof, end-to-end integration flow for Lark.

7.3.1 Overall Sequence and Pitfall Warnings

Note: Lark integration introduces external platform variables. If you are still setting up your local baseline environment, it is recommended to complete the basic configuration in [Chapter 3] and verify its stability before starting this section.

Common Failure Points Quick Check (Review before starting):

  • Long-Connection Subscription Failure: Ensure that "Version Management & Release" has a version created and published before enabling Event Subscription (Long-Connection).
  • Messages Not Triggering: Check if Group Chat is enabled, if @mentions are required, and if the group is in the allowlist; then check logs to see if gating rules were hit.
  • PM Requires Pairing: If the pairing policy is enabled, the first private message will typically receive a pairing code. Normal dialogue only begins after approval.
  • Regardless of the platform, the pipeline is similar, but Lark has a specific step order that is extremely easy to mess up:
  1. ✅ Lark Side: Create App -> Configure Permissions -> Publish App (Critical!)
  2. ✅ OpenClaw Side: Configure Lark Channel (openclaw channels add)
  3. ✅ OpenClaw Side: Start Gateway (openclaw gateway start)
  4. ✅ Lark Side: Return to the platform to enable "Event Subscription (Long-Connection)" and add events.
  5. The sequence diagram below illustrates the end-to-end interaction flow for Lark integration:

End-to-End Interaction Flow for Lark Integration(Sequence Summary):

  1. Lark Platform: Create self-built app, configure permissions/bot, and Publish Version.
  2. OpenClaw Local: Add channel (App ID/Secret) and start the gateway.
  3. Lark Platform: Enable Long-Connection subscription; system establishes the link.
  4. User: Sends a message via Lark Client.
  5. Interaction: Lark pushes events; OpenClaw checks pairing, returns a code, and the user approves via CLI.
  6. Success: Subsequent messages receive AI replies.

Lessons Learned: If you go straight to "Events and Callbacks" to enable the long connection (Step 4) without clicking "Create Version and Publish" (Step 1), the system will infinitely report "Long-connection subscription failed."


7.3.2 Configuring the App on Lark Open Platform

  1. Open the Lark Open Platform, click "Create Custom App," and obtain the App ID and App Secret.
  2. Permission Configuration (Batch Import Recommended): Go to "Permission Administration" on the left, select "Batch Import," and paste the required scopes (including im:message, im:message.p2p_msg:readonly, im:message:send_as_bot, etc.) to avoid omissions.
  3. Find the Bot card under "App Capabilities" and enable it.
  4. Publish App: Go to "Version Management & Release," create a version (e.g., 1.0.0), and submit for release. (Enterprise admins usually approve this instantly).

7.3.3 Completing the Binding on the OpenClaw Side

Lark is not a built-in channel; therefore, you must install and enable the corresponding plugin before configuring the channel:

Bash

# 1. Download and install the plugin (Required, or 'enable' will fail)

openclaw plugins install @openclaw/feishu


# 2. Enable the plugin for the runtime

openclaw plugins enable feishu


# 3. Add and configure the channel

openclaw channels add

[!NOTE] Difference between install and enable: install downloads the code package from the registry to the local environment; enable registers the plugin into the current openclaw.json configuration to activate it. A common mistake is trying to enable directly. If it fails, force the install step to bring the plugin into the local extensions directory.
  • Select Feishu/Lark.
  • Enter your App ID and App Secret.
  • For the Chinese version, ensure you select the domain feishu.cn.
  • Initial Setup Tip: Set Group Chat to disabled first, then change it once the connection is verified.
  • Verify connectivity after configuration:
  • Bash

openclaw channels list

7.3.4 Enabling Event Subscription and Your First Chat

  1. Launch the gateway in your terminal: openclaw gateway start.
  2. Return to the Lark Developer Console under "Events and Callbacks," Enable Long-Connection, and add the event im.message.receive_v1 (Receive Message).
  3. If no error occurs after saving, the subscription is connected.
  4. Search for your bot in the Lark desktop client and send a private message like "Hello."
  5. Since you configured a pairing policy in [Chapter 3], you will receive a Pairing Code. Copy this code and approve it in your terminal: openclaw pairing approve feishu <CODE>
  6. Finally, enjoy your enterprise-exclusive personal assistant!

7.4 Routing Fundamentals: From Single-Agent to Multi-Agent

Routing is not about "whether a model can do it," but rather "which agent should take over this message, and what is it authorized to do." This section focuses on OpenClaw's multi-agent routing to explain the decision chain, the priority of binding rules, and how to use observability to turn routing into an engineering capability that is replayable, auditable, and troubleshootable.

7.4.1 Core Problems Routing Solves: Ownership, Permissions, and State Isolation

In a multi-agent system, three things must be determined as soon as a message enters: Who processes it (Ownership), what can be done (Tools & Permissions), and where the state is written (Sessions & Memory). Failure to clarify these results in two common types of faults:

  1. Unclear Responsibility: Multiple agents respond simultaneously or overwrite each other's state, leading to inconsistent output.
  2. Unauthorized Operation: A low-trust entry point triggers high-risk tools, escalating a "wrong answer" into "corrupted data."
  3. OpenClaw pushes "who takes over" down into the multi-agent routing and binding mechanism, while "what can be done" is handled by Tool Policies and Sandbox constraints. The primary goal of the routing layer is to stably converge processing power to a specific agent, ensuring subsequent operation occurs within controlled boundaries.

7.4.2 The Decision Chain: Binding First, Then Routing

The typical OpenClaw decision chain can be summarized as: "Match bindings first, then enter the router." Bindings are used to stably hand over messages from specific sources to designated agents, reducing the uncertainty of model-based classification. If a binding hits, the system bypasses the router and delivers the message directly to the bound agent.

The "Binding-First" Branch in Multi-Agent Routing

(Flow Summary):

  1. Inbound Message -> Extract routing fields.
  2. Match Bindings:If Hit: Hand over to the bound Agent.If Miss: Enter the Router -> Hand over to the routed Agent.
  3. Policy Layer: Apply Tool Policies & Sandbox Constraints.
  4. Operation: Run task and re-inject results.

In engineering practice, it is recommended to prioritize high-risk or high-certainty entry points in bindings and leave intent-based entry points to the router.


7.4.3 Implementing Bindings: Fixing High-Risk Entry Points to Controlled Agents

The advantage of binding is that it turns "routing correctness" from a probability problem into a rule-based one. In the configuration structure, bindings is a top-level array (at the same level as agents and channels). Each binding points to a target agent via agentId and describes matching conditions via a match object.

JavaScript

{

 // bindings is a top-level array, NOT nested inside agentsbindings: [

   {

     agentId: "work",

     match: {

       channel: "whatsapp",

       peer: { kind: "direct", id: "+15551234567" },

     },

   },

   {

     agentId: "work",

     match: {

       channel: "telegram",

       peer: { kind: "direct", id: "987654321" },

     },

   },

 ],


 agents: {

   list: [

     {

       id: "work",

       name: "Work Assistant",

       workspace: "~/.openclaw/workspace-work",

       agentDir: "~/.openclaw/agents/work/agent",

     },

   ],

 },

}

[!WARNING] bindings must be placed at the top level of the configuration file. Nested definitions inside an agent object in agents.list will not be recognized, causing the binding to fail silently.

As the system grows to "multi-account, multi-group, multi-entry," we recommend a two-layer strategy:

  1. Entry-Layer Binding: Bind by channel, peer, or accountId to an entry agent responsible for triggering rules and context preprocessing.
  2. Task-Layer Routing: The entry agent then dispatches tasks to domain-specific agents or uses sub-agents for parallel processing.

Full Multi-Agent Routing Configuration Example

Here is a real-world scenario: A team uses Telegram and WhatsApp and needs to route tasks to different agents based on the source and risk level.

Scenario Description

  • assistant: Default agent, handles daily queries, no external tool permissions.
  • devops: O&M agent, bound to a specific Telegram group (DevOps-team), has exec and ssh tools.
  • writer: Writing agent, bound to a specific WhatsApp peer (Product Editor), has document tools, no operation rights.
  • Complete Configuration
  • JavaScript

{

 agents: {

   list: [

     {

       id: "assistant",

       default: true,

       name: "Default Assistant",

       workspace: "~/.openclaw/workspace-assistant",

       agentDir: "~/.openclaw/agents/assistant/agent",

       model: "anthropic/claude-sonnet-4-6",

       tools: { allow: ["group:fs", "group:web"], deny: ["group:runtime"] },

     },

     {

       id: "devops",

       name: "DevOps Agent",

       workspace: "~/.openclaw/workspace-devops",

       agentDir: "~/.openclaw/agents/devops/agent",

       model: "anthropic/claude-sonnet-4-6",

       tools: { allow: ["group:runtime", "group:fs", "group:web"] },

       sandbox: { mode: "all", scope: "agent" },

     },

     {

       id: "writer",

       name: "Writing Agent",

       workspace: "~/.openclaw/workspace-writer",

       agentDir: "~/.openclaw/agents/writer/agent",

       model: "anthropic/claude-sonnet-4-6",

       tools: { allow: ["group:fs", "group:web"], deny: ["group:runtime"] },

     },

   ],

 },


 bindings: [

   {

     agentId: "devops",

     match: {

       channel: "telegram",

       peer: { kind: "group", id: "-1001234567890" },

     },

   },

   {

     agentId: "writer",

     match: {

       channel: "whatsapp",

       peer: { kind: "direct", id: "+8615600000000" },

     },

   },

 ],


 channels: {

   telegram: { enabled: true },

   whatsapp: { enabled: true },

 },

}

Binding Priority and Matching Order When a message arrives, the system attempts to match in the following order (most specific first):

  1. Peer match: Highest priority. Exact peer: { kind, id } match.
  2. parentPeer match: Thread inheritance matching for nested routing.
  3. guildId + roles: Discord server + role matching.
  4. guildId: Discord server-only matching.
  5. teamId: Slack workspace matching.
  6. accountId match: Exact channel account instance matching.
  7. Channel-level match: Matches an entire channel (where accountId is "*" or omitted).
  8. Fallback: If no bindings hit, the agent marked default: true is used.
[!NOTE] When a binding contains multiple match fields, all fields must be satisfied (AND semantics). The first hitting binding takes effect; subsequent checks are bypassed.

Verify Loading and Priority

Bash

openclaw agents list --bindings

openclaw logs --follow --json --filter "routing"


7.4.4 Observability: Logging Routing Reasons

The difficulty in troubleshooting multi-agent systems is that "it looks like a model issue, but it's actually a routing or policy issue." We recommend logging:

  1. Routing Reason: Which binding hit or why the router chose a specific agent.
  2. Operation Boundary: Which tool policies and sandbox constraints were applied to the request.
  3. Engineering-wise, you must also explicitly prevent routing loops. If agents are allowed to hand over tasks, implement hop limits or link de-duplication to prevent infinite ping-ponging.

7.4.5 Memory Isolation: Independent Files and Indices per Agent

While routing ensures messages go to the right agent, Memory Isolation ensures knowledge doesn't cross-contaminate. OpenClaw implements dual-layer isolation: Source Files are separated by workspace, and Vector Indices are separated by Agent ID.

Plaintext

~/.clawdbot/memory/               # Index storage (Status dir)

├── assistant.sqlite              # Vector index for Assistant

└── devops.sqlite                 # Vector index for DevOps


~/.openclaw/workspace-assistant/  # Assistant workspace (Source files)

├── MEMORY.md

└── memory/


~/.openclaw/workspace-devops/     # DevOps workspace (Source files)

├── MEMORY.md

└── memory/

Each agent in agents.list declares an independent workspace. Their MEMORY.md and memory/ directories are physically separate. Index files (SQLite) are stored centrally but distinguished by agentId—memory_search will only query the index file corresponding to the current agent.

Engineering Significance:

  • Default Zero-Trust: Even without sandbox mode, an agent's memory tools can only access files within its own workspace.
  • Auditability: Memory change history is tracked per agent, allowing precise pinpointing of contamination.
  • Independent Lifecycle: Deleting an agent only requires cleaning its workspace and specific SQLite file without affecting others.
⚠️ WARNING: Without strict sandbox mode (sandbox.mode: "all"), an agent might theoretically use filesystem tools to read another agent's workspace. In production, explicitly configure the sandbox parameter for each agent to ensure filesystem access is constrained to its own path.

7.5 Collaboration Patterns: Sub-Agents and Broadcast Groups

Multi-agent systems are used not only for entry routing but also for decomposing complex tasks into parallelizable sub-tasks and delivering results to multiple target groups or endpoints. This section details the configuration frameworks and validation commands for OpenClaw's Sub-Agent and Broadcast Group capabilities. It also introduces an engineering-grade Handover Protocol to prevent collaboration from losing fidelity due to vague "verbal descriptions."

7.5.1 Sub-Agents: Parallel Decomposition and Summarization

Sub-agents are task-operation units dynamically derived by a master agent via built-in tools or slash commands. They allow a complex task to be split into multiple parallel branches, with the master agent serving as the coordinator and summarizer.

The master agent can derive sub-agents manually via slash commands or automatically through the spawn_subagent tool. Basic slash commands include:

Bash

/subagents spawn <agentId> <task>   # Derive a sub-agent to execute a specific task

/subagents list                     # View all active sub-agents

/subagents log <id>                  # View sub-agent runtime logs

/subagents erminate <id|all>            # Terminate sub-agents

7.5.2 Broadcast Groups: Delivering Results to Multiple Targets

Broadcast groups distribute a single inbound message to multiple agents for parallel processing simultaneously—ideal for scenarios like "multi-role review," "multi-language support," or "batch alerting."

Broadcasts are configured via the top-level broadcast field, mapping a Group ID or phone number to an array of Agent IDs:

JavaScript

{

 broadcast: {

   strategy: 'parallel',                            // Optional, defaults to parallel'120363000000000000@g.us': ['alfred', 'baerbel'], // WhatsApp Group → Two agents respond'+15555550123': ['support', 'logger'],           // DM Number → Two agents respond

 },

}

Because broadcasting is a high-impact action, it should be paired with Tool Policies: only allow specific agents to trigger broadcasts, and log the group name, target summary, and trigger reason for auditing.


7.5.3 Announce Queue: The Delivery Protocol for Sub-Agent Results

When a sub-agent completes a task, it must "notify" the parent agent or target session. OpenClaw manages this through the Announce Queue, which handles concurrency conflicts, cross-channel routing, message loss, and retries.

Queue Lifecycle: Enqueue → Debounce → Drain → Deliver

Core Configuration Parameters:

ParameterDefaultDescription
debounceMs1000 (1s)Window to aggregate more incoming messages.
cap20Maximum queue depth.
dropPolicy"summarize"Overflow policy: summarize, old (drop oldest), or new (reject new).
modeQueue behavior: steer, followup, collect, queue, etc.

Key Features:

  • Cross-Channel Synthesis: If messages in the queue come from different channels, the system uses deliveryContextKey to detect this. It merges messages in single-channel scenarios but forces individual delivery for cross-channel messages to avoid routing confusion.
  • Retries & Backoff: Failed deliveries use exponential backoff: Math.min(1000 × 2^consecutiveFailures, 60000) ms.
  • Idempotency: Each notification carries an announceId to prevent duplicate messages during retries.
  • Inbound Debounce: A separate mechanism that buffers rapid-fire user messages from the same sender, processing them as a batch once the timeout window closes.

7.5.4 Handover Protocol: Structured Results for Reliability

The most common failure in collaboration is incomplete handover information. Use a structured template for handovers:

  1. Conclusion: A one-sentence summary and scope.
  2. Evidence: Source links or log traces (including traceId).
  3. Actions: Executed commands or configuration snippets.
  4. Verification: Expected output and next steps for error branches.

Sub-Agent Collaboration Configuration Example: Daily Report System

Scenario: A coordinator triggers daily at 09:00, spawning sub-agents to check Git commits and Jira tasks, then broadcasts a summary.

JavaScript

{

 agents: {

   list: [

     {

       id: "coordinator",

       displayName: "Report Coordinator",

       tools: ["spawn_subagent", "summarize_reports", "broadcast_message"],

       systemPrompt: "You are the daily summarizer. At 9 AM, spawn sub-agents to check code and tasks, then broadcast the result."

     },

     {

       id: "code-reviewer",

       toolGroups: ["git_readonly"],

       systemPrompt: "Analyze Git commits from the last 24h. Return JSON."

     },

     {

       id: "task-tracker",

       toolGroups: ["jira_readonly"],

       systemPrompt: "Search for tasks marked 'Done' today. Return JSON."

     }

   ]

 },


 cron: [

   {

     id: "daily_standup",

     agentId: "coordinator",

     expression: "0 9 * * 1-5", // Mon-Fri 09:00task: "Generate today's report",

   },

 ],


 broadcast: {

   strategy: "sequential",

   "C1234567890": ["coordinator"],              // Slack Channel"120363000000000000@g.us": ["coordinator"]   // WhatsApp Group

 }

}

Verification & Monitoring Commands:

Bash

openclaw cron list              # Check scheduled tasks

openclaw cron trigger <id>      # Manually trigger for testing

openclaw logs --follow --json --filter "subagent"

openclaw logs --follow --json --filter "broadcast"

7.5.5 Chapter Summary

Sub-agents parallelize complex tasks while maintaining clear boundaries via the dynamic spawn mechanism. Broadcast groups ensure stable delivery to multiple agents via top-level configuration. Both should be paired with Tool Policies and structured logging to ensure the collaboration is auditable and replayable. By standardizing handover protocols, you can significantly reduce information loss in multi-agent workflows.


7.6 Chapter Summary

Chapter 7 establishes deterministic boundaries for multi-channel access and multi-agent collaboration: Channel Strategies converge the trigger surface, Binding and Routing converge ownership, and Tool Policies and Sandbox Constraints converge operation capabilities. All of these are supported by probes and structured logs to provide a replayable troubleshooting path.

7.6.1 Key Conclusions

  1. The engineering goal of Routing is to ensure a unique owner and an explainable rationale, which can be reproduced through log replays.
  2. Channel Governance should begin with Private Message (PM) and Group Chat strategies, using gating and allowlists by default to converge the trigger surface.
  3. Multi-Account and Binding are used to further isolate entry point responsibilities and fix high-certainty sources, reducing the risks of mis-triggering and unauthorized operation.
  4. Collaboration Patterns must be constrained by auditability and replayability to avoid introducing unexplainable branches in high-risk capabilities.

7.6.2 Reader Self-Check

  • Can you answer the following for any given message: Who took over? What was the basis? How can the failure be replayed?
  • Have you configured mention gating and allowlists for group chats and verified that they are effective?
  • Can you explain the results of entry point governance using openclaw channels status --probe and openclaw agents list --bindings?

7.6.3 Community Practice Inspirations

Once a proper routing network and sandbox policies are configured, multi-agent collaboration can significantly extend business boundaries:

  • Omni-channel Personal Assistant: Configure multi-channel routing to take over inboxes of any common chat software simultaneously. No matter where you send a message, you receive a consistent service experience.
  • Multi-role Pipeline Production: Configure a team of expert agents to collaborate—such as an "Outline Planner," "Content Expander," and "Layout Proofreader"—each performing their duties as material drafts flow automatically between upstream and downstream.
  • 24/7 Intelligent Triage: Automatically dispatch inbound traffic to different nodes based on intent. High-risk system operations are assigned to powerful reasoning models with allowlisted sandboxes, while casual chats are routed to low-cost models.

7.6.4 Preview of the Next Chapter

[Chapter 8] moves into Automation and O&M (Operations): covering self-checks, scheduled jobs (Cron), remote access, and security baselines. The goal is to advance the system from "functional" to "capable of long-term stable operation."

Chapter 8: Automation and Operations Security Practices

This chapter focuses on the continuous operation scenarios of OpenClaw: exploring how to evolve the system from an initial "test-ready" state into an industrial-grade architecture capable of "long-term reliable operation" in production environments. Through this chapter, you will master the core capabilities required to keep OpenClaw running securely, predictably, and auditably under unattended conditions.

Long-term operability is far more than simply adding a few cron scripts; it requires embedding the following four dimensions of governance capabilities into the system at the architectural level:

  • Pluggable Lifecycle Governance (Hooks): Decoupling and integrating custom filtering, auditing, and control logic at critical nodes of the core operation chain (such as input, operation, and output) to prevent infinite bloat of business code.
  • Schedulable Unattended Jobs (Cron Jobs): Designing a scheduled job model with idempotency, concurrency control, and takeover mechanisms to ensure that background non-interactive tasks execute safely and predictably.
  • Periodic Awareness and Proactive Notification (Heartbeat): Utilizing the built-in heartbeat mechanism to allow agents to inspect multiple data sources at a fixed rhythm—proactively pushing critical information only when necessary ("notify on event"), effectively replacing multiple independent polling tasks with a single cycle.
  • Auditable Security Baseline System: Moving beyond the dilemma of "post-event log searching" to establish a structured audit model based on the "Event, Subject, Action, Evidence" quadruple, ensuring every critical write operation is traceable and reviewable.
  • Maintainable Remote Access Control: Finding the balance between "reachability" and "non-exposure" by reshaping remote entry points through Zero Trust Architecture, establishing strong identity authentication, principle of least privilege, and rapid-revocation emergency response channels.

Chapter Guide

This chapter includes the following sections:

  • 8.1 Hooks: Lifecycle and Event Entry Points:  Integrating custom logic at key nodes of the core operation chain to decouple business and system code.
  • 8.2 Scheduled Job Design and Dispatch Strategies: Designing idempotent and controlled background cron tasks for unattended system operation.
  • 8.3 Heartbeat Mechanism: Periodic Inspection and Proactive Notification: Deep dive into OpenClaw's built-in heartbeat scheduling primitives, from timers to the complete message delivery lifecycle.
  • 8.4 Remote Access: SSH, Intranet Penetration, and Zero Trust: Balancing accessibility and security by establishing secure remote access channels.
  • 8.5 Security Baseline and Auditing Processes: Establishing structured auditing mechanisms to ensure every major system operation is traceable and reviewable.
  • 8.6 Chapter Summary: Key conclusions and reader self-assessment.

Learning Objectives

After completing this chapter, you will be able to:

  1. Design Lifecycles: Inject custom logic at critical nodes to extend system capabilities.
  2. Implement Automation: Design secure and predictable scheduled jobs.
  3. Ensure Security: Manage remote access through Zero Trust principles.
  4. Establish Auditing: Make every significant system operation traceable.

8.1 Hooks: Lifecycle and Event Entry Points

This section discusses the engineering implementation of Hooks: decoupling governance logic from the main operation chain and ensuring that the Hook itself does not become a new source of failure.

[!NOTE] The Hook patterns discussed here represent general engineering best practices. Specific registration methods and event lists on the implementation side may evolve with versions: use the actual output of doctor, status --deep, and structured logs as your source of truth. This section focuses on the responsibility boundaries and stability constraints of Hooks.

8.1.1 Responsibility Boundaries of Hooks

Hooks are suited for carrying cross-cutting concerns rather than the primary business flow. Common scenarios include:

  • Input Governance: Rate limiting, black/white listing, format validation.
  • Operation Observation: Tool call auditing, risk tagging, policy logging.
  • Output Governance: Data masking, format convergence, audit field enrichment.
  • Boundary Principle: Hooks are responsible for "governance and observation," while the main chain is responsible for "task progression."

8.1.2 Three-Stage Lifecycle: Input, Operation, and Output

It is recommended to fix Hooks within three specific stages to avoid role confusion. The following flowchart illustrates how Hooks intercept the three stages of the main chain:

{% @mermaid/diagram content="flowchart LR

subgraph input["Input Stage"]

I1["Rate Limit/Whitelisting"] --> I2["Format Validation"]

end

subgraph exec["Operation Stage"]

E1["Tool Call Audit"] --> E2["Risk Tagging"]

end

subgraph output["Output Stage"]

O1["Data Masking"] --> O2["Format Convergence"]

end

req["Inbound Request"] --> input

input -->|"Admission Passed"| main["Main Chain Reasoning"]

main --> exec

exec --> result["Generate Result"]

result --> output

output --> resp["Return Response"]

input -->|"Admission Rejected"| reject["Early Rejection"]" %}

Hook Entry Points in the Three-Stage Lifecycle of the Main Chain

StageObjectiveProhibited Actions
Input StageEarly rejection, noise reduction, admission validationLong-duration external calls, irreversible writes
Operation StageRecording key decisions, risk interceptionModifying core business state
Output StageMasking and format governanceTemporary privilege escalation, bypassing policy enforcement

8.1.3 Stability Constraints: Timeout, Degradation, and Idempotency

Hooks must be constrained; otherwise, they risk dragging down the main chain.

  • Timeout Limits: Each Hook must have an independent timeout, with a defined degradation strategy upon timing out.
  • Failure Semantics: Be explicit about fail-open or fail-closed; do not rely on implicit default behaviors.
  • Idempotency Requirements: Hooks must not produce duplicate external side effects during retries.
  • For auditing Hooks, a "fail-open with alert" approach is recommended; for compliance/interception Hooks, "fail-closed" is advised.

8.1.4 Structured Events: Ensuring Auditability and Replayability

It is recommended that all Hook events use a unified structure, containing at minimum: Timestamp, Trace ID, Stage, Action, and Result.

JSON

{

 "ts": "YYYY-MM-DDTHH:MM:SSZ",

 "trace_id": "t-YYYYMMDD-001",

 "stage": "input",

 "hook": "rate_limit_guard",

 "event": "rejected",

 "reason": "rate_limit_exceeded"

}

8.1.5 Acceptance and Troubleshooting

After deployment, two types of checks are recommended:

  1. Normal Traffic: Confirm Hooks do not introduce significant latency.
  2. Abnormal Traffic: Confirm that interception and degradation behaviors meet expectations.
  3. Bash

openclaw status --deep

openclaw logs --follow --json

cat runtime.log | jq -r 'select(.type=="log") | .log | select(.component=="hook") | .event' | sort | uniq -c | sort -nr | head

8.2 Scheduled Job Design and Dispatch Strategies

This section focuses on the stability design of unattended operations. The goal is not merely to "trigger once at a set time," but to ensure that "repeated Operation does not lead to a loss of control."

[!NOTE] The concepts of re-entrancy protection, idempotency keys, and failure shunting discussed here are general scheduling engineering practices applicable to external scheduled jobs orchestrated on a host. Specific switches and event names for built-in scheduling mechanisms may evolve with versions: use the actual output of --help, status --deep, and structured logs as your source of truth.

Concrete Example: Automated Daily Standup Summaries

Suppose a team wants an agent to automatically pull yesterday's progress from Slack and Jira every morning at 9:00 AM, generate a formatted standup summary, and post it to a Feishu/Lark group. The engineering constraints for this job are as follows:

  • Idempotency: If the 9:00 AM task fails due to network jitter and retries at 9:05 AM, it must not post two duplicate reports to the group.
  • Re-entrancy Protection: If a large volume of data from yesterday causes generation time to exceed the trigger interval, two concurrent tasks must not be allowed to overlap or overwrite each other.
  • Observability: The duration, success/failure status, and delivery target of each operation must be logged.
  • Recoverability: Automatically back off and retry when Jira API is rate-limited; terminate and alert when Slack authentication fails, rather than entering an infinite loop.
  • Corresponding cron configuration and idempotency key design:
  • Bash

# crontab entry

0 9 * * 1-5 /opt/openclaw/jobs/daily_standup.sh >> /var/log/oc_jobs/standup.log 2>&1


# daily_standup.sh core logic

WINDOW_START=$(date -d "today 09:00" +%s)

IDEM_KEY="daily_standup:v1:${WINDOW_START}"# Idempotency check: Execute only once per windowif redis-cli SET "oc_idem:${IDEM_KEY}" 1 NX EX 86400; then

 openclaw agent --message "Generate today's standup summary and post to Feishu group ops_daily"elseecho "Already executed, skipping"fi

8.2.1 Four Engineering Constraints for Scheduled Jobs

Scheduled tasks in a production environment must satisfy at least the following:

  • Idempotency: Repeated operation produces no duplicate side effects.
  • Re-entrancy Protection: No concurrent conflicts if the previous run hasn't finished.
  • Observability: Each operation tracks status, duration, and error classification.
  • Recoverability: Clear paths for retries and manual takeover after failure.

8.2.2 Re-entrancy Protection: Distributed Locks and Instance Ownership

The most common issue when task operation time exceeds the trigger cycle is a "re-entrancy storm." It is recommended to use a distributed lock with a TTL (Time-To-Live) and record instance ownership.

Bash

# Execute task only if lock acquisition is successful

redis-cli SET oc_job_lock:daily_report "<instance_id>" NX EX 600

When releasing the lock, verify the lock holder to avoid accidentally deleting a lock belonging to another instance.

8.2.3 Idempotency Keys: Building Unique Keys by Scheduling Window

Before retrying a job, define idempotency key rules. A recommended format is "Job Name + Window Start Time."

idempotency_key = "daily_report:v2:" + window_start_ts

As long as the idempotency key remains consistent, downstream write operations must behave such that "repeated requests have no additional side effects."

When releasing the lock, verify the lock holder to avoid accidentally deleting a lock belonging to another instance.

8.2.3 Idempotency Keys: Building Unique Keys by Scheduling Window

Before retrying a job, define idempotency key rules. A recommended format is "Job Name + Window Start Time."

idempotency_key = "daily_report:v2:" + window_start_ts

As long as the idempotency key remains consistent, downstream write operations must behave such that "repeated requests have no additional side effects."

Focus on three items during acceptance:

  1. Whether re-entrancy occurs.
  2. Whether duplicate side effects appear.
  3. Whether the system can recover or escalate within the specified time after a failure.
💡 Troubleshooting Note: The "Ghost Operation" of Scheduled TasksA standup summary task was set for 9:00 AM daily, but occasionally executed twice—at 9:00 and 9:03. Investigation revealed that the cron scheduler reloaded during a Gateway restart. If the restart happened exactly within the task trigger window, duplicate operation occurred. Solution: Add an idempotency check within the task logic (e.g., checking if the summary was already sent today) or use openclaw cron list to confirm task status before restarting.

8.3 Heartbeat Mechanism: Periodic Inspection and Proactive Notification

The previous section covered Cron jobs designed for precise points in time; this section introduces another built-in scheduling primitive in OpenClaw—Heartbeat. If Cron answers "what should be done at a specific time," Heartbeat answers "is there anything that needs attention?"

[!TIP] Not sure whether to choose Heartbeat or Cron? A quick rule of thumb: If you need precise timing → use Cron; if you need periodic awareness and "as-needed" notifications → use Heartbeat. See 8.3.8 Selection Decision Tree for a detailed comparison.

8.3.1 Core Concepts

A Heartbeat is a scheduled agent turn at the gateway level: at fixed intervals (default 30 minutes), the gateway injects a heartbeat prompt into the agent's main session, triggering a full agent reasoning cycle. The agent reads the HEARTBEAT.md checklist in the workspace, inspects statuses like inboxes, calendars, and todos, and then provides one of two responses:

  • Nothing happening: Replies with HEARTBEAT_OK. The gateway silently swallows this response to avoid disturbing the user.
  • Something needs attention: Returns alert text, which the gateway delivers to the designated channel (WhatsApp, Telegram, Slack, etc.).
  • This means you don't need to write separate Cron jobs for every periodic check—one heartbeat cycle can batch-process multiple inspection items, and messages are sent only when there is actual "news," naturally preventing information overload.

8.3.2 Full Lifecycle: From Timer to Message Delivery

The following diagram illustrates the complete heartbeat path from timer trigger to message delivery. When troubleshooting, locating where it "stuck" using this map is the fastest approach.

{% @mermaid/diagram content="flowchart TD

A["① Timer Expires"] --> B["② Wake-up Consolidation


Merge multiple triggers within a 250ms window"]

B --> C{"③ Pre-check Gating"}

C -->|"Disabled / Quiet Hours / Queue Busy / Checklist Empty"| SKIP["Skip This Turn"]

C -->|"All Passed"| D["④ Assemble Prompt


Select template based on source, append current time"]

D --> E["⑤ Call LLM"]

E --> F{"⑥ Parse Response"}

F -->|"HEARTBEAT_OK or Empty"| G["Silent Handling


Roll back session timestamp"]

F -->|"Same as Last Time"| H["Deduplication Skip"]

F -->|"Substantive Content"| I["⑦ Parse Delivery Target & Visibility"]

I --> J["⑧ Channel Readiness Check & Send"]

J --> K["⑨ Emit Event, Trim Transcript, Advance Schedule"]" %}

Several key design decisions are worth noting:

Timers are per-agent. Each agent independently maintains its own heartbeat interval and last operation time. In multi-agent scenarios, they heartbeat at their own pace without blocking each other. When configurations are hot-reloaded, the system recalculates intervals for all agents.

Wake-up consolidation prevents storms. Multiple trigger sources (timer expiry, Cron events, exec completion) might request a heartbeat simultaneously. The system uses a 250ms window to merge them into a single operation, preserving the highest priority wake-up reason (ACTION > DEFAULT > INTERVAL > RETRY).

Heartbeats do not extend session life. If the response is HEARTBEAT_OK (no substantive content), the gateway rolls back the session's updatedAt timestamp to its pre-heartbeat value. This ensures that idle expiry works correctly—pure heartbeats should not indefinitely prolong a session's lifespan.

8.3.3 Messages Sent to LLM and the HEARTBEAT_OK Protocol

The key to understanding heartbeat behavior is knowing exactly what is sent to the LLM during each cycle and how the system interprets the reply.

The Heartbeat Section in the System Prompt

When promptMode is not "minimal", the gateway injects a heartbeat protocol instruction into the system prompt:

## Heartbeats

Heartbeat prompt: [Configured heartbeat prompt or default]

If you receive a heartbeat poll (a user message matching the heartbeat

prompt above), and there is nothing that needs attention, reply exactly:

HEARTBEAT_OK

OpenClaw treats a leading/trailing "HEARTBEAT_OK" as a heartbeat ack

(and may discard it).

If something needs attention, do NOT include "HEARTBEAT_OK"; reply with

the alert text instead.

If using Lightweight Context mode (lightContext: true), this paragraph is not injected—because in lightweight mode, only HEARTBEAT.md is loaded, not the full system prompt paragraphs.

Project Context

The # Project Context section of the system prompt injects workspace bootstrap files. Which files are injected depends on the lightContext configuration:

ModeInjected FilesUse Case
Full Mode (Default)All bootstrap files (SOUL.md, README.md, TOOLS.md, MEMORY.md, HEARTBEAT.md, etc.)Requires full context for decision-making
Lightweight Mode (lightContext: true)HEARTBEAT.md onlyInspection checklist is clear; no other context needed; saves tokens

User Message Body

The user message is the "trigger command" for the heartbeat, taking three forms depending on the source:

① Regular Heartbeat (Timer Expiry)

Read HEARTBEAT.md if it exists (workspace context). Follow it strictly.

Do not infer or repeat old tasks from prior chats.

If nothing needs attention, reply HEARTBEAT_OK.

Current time: 2026-03-09 14:30 (Asia/Shanghai) / 06:30 UTC

The default prompt can be completely replaced via heartbeat.prompt. The time line at the end is automatically appended in a fixed format: Current time: [Local Time] ([Timezone]) / [UTC Time] UTC.

② Cron Event Trigger

When a Cron job generates a system event, the next heartbeat will automatically pick it up:

A scheduled reminder has been triggered. The reminder content is:


[Reminder text from Cron event]


Handle this reminder internally. Do not relay it to the user unless

explicitly requested.

Current time: 2026-03-09 14:30 (Asia/Shanghai) / 06:30 UTC

③ Async Command (exec) Completion Trigger

An async command you ran earlier has completed. The result is shown in

the system messages above. Handle the result internally. Do not relay

it to the user unless explicitly requested.

Current time: 2026-03-09 14:30 (Asia/Shanghai) / 06:30 UTC

Impact of the isHeartbeat Flag

The heartbeat cycle passes an isHeartbeat: true flag through the entire call chain, producing the following effects:

SegmentEffect
Model SelectionCan use heartbeat.model to override the default model
Bootstrap ContextSwitches to lightweight filtering (HEARTBEAT.md only) when lightContext: true
Tool Error AlertsTool error alerts can be suppressed via config to avoid interfering with heartbeat judgment
Transcript TrimmingTrims the transcript after a HEARTBEAT_OK reply to avoid context pollution

HEARTBEAT_OK Response Protocol

HEARTBEAT_OK is more than a string; it is a bidirectionally agreed-upon protocol token:

PositionBehavior
Start or end of replyIdentified as an ACK token; if remaining text is ≤ ackMaxChars (default 300), the entire reply is discarded
Middle of replyNo special handling; treated as regular text
In non-heartbeat turnLeading/trailing HEARTBEAT_OK is silently stripped and logged; messages containing only this token are discarded

The system's standardization process also handles occasional formatting marks added by LLMs—<b>HEARTBEAT_OK</b> and HEARTBEAT_OK are both correctly recognized and stripped.

8.3.4 Configuration Details

Minimal configuration works with default values. The full configuration is as follows:

JavaScript

{

 agents: {

   defaults: {

     heartbeat: {

       every: "30m",                // Interval duration string; "0m" to disablemodel: "anthropic/claude-sonnet-4-6", // Optional: use a cheaper model for heartbeatsprompt: "Read HEARTBEAT.md if it exists...", // Custom prompt (replaces, doesn't merge)target: "none",              // "none" | "last" | specific channel nameto: "+15551234567",          // Specific recipient within the channelaccountId: "ops-bot",        // Account ID for multi-account channelsdirectPolicy: "allow",       // "allow" | "block" (block DM deliveries)lightContext: false,         // If true, only injects HEARTBEAT.md to save tokensincludeReasoning: false,     // If true, sends the reasoning process as wellackMaxChars: 300,            // Max allowed characters alongside HEARTBEAT_OKactiveHours: {               // Active period limitsstart: "09:00",            // HH:MM, inclusiveend: "22:00",              // HH:MM, exclusive; "24:00" means midnighttimezone: "Asia/Shanghai"  // "user" | "local" | IANA timezone

       }

     }

   }

 }

}

Scope and Priority

Heartbeat configuration cascades across two dimensions:

Agent Dimension:agents.defaults.heartbeat sets the global default; agents.list[].heartbeat overrides for a specific agent. Once any agent declares a heartbeat block, only those agents that declared it will run heartbeats.

Channel Visibility Dimension:channels.defaults.heartbeat → channels.<channel>.heartbeat → channels.<channel>.accounts.<id>.heartbeat. Overrides happen from broad to specific, controlling the showOk, showAlerts, and useIndicator toggles.

Multi-Agent Configuration Example

JavaScript

{

 agents: {

   defaults: {

     heartbeat: { every: "30m", target: "last" }

   },

   list: [

     { id: "main", default: true },       // No heartbeat block -> Heartbeat not running

     {

       id: "ops",

       heartbeat: {                       // Only the ops agent runs heartbeatsevery: "1h",

         target: "telegram",

         to: "12345678:topic:42",

         accountId: "ops-bot"

       }

     }

   ]

 }

}

8.3.5 HEARTBEAT.md Checklist and Active Hours

Heartbeat Checklist

HEARTBEAT.md is an optional file in the agent's workspace root directory that acts as a heartbeat inspection checklist. The default prompt instructs the agent to read and strictly execute it.

Markdown

# Heartbeat Inspection Checklist- Scan inbox; notify with summary if there are urgent emails

- Check calendar events for the next 2 hours

- Report results if any background tasks have completed

- Send a brief greeting if idle for more than 8 hours

Design Points:

  • Keep it concise. This file is injected into the context every heartbeat; a bloated checklist translates directly into token costs.
  • Empty file equals off. If HEARTBEAT.md contains only empty lines or Markdown headers (e.g., # Heading), the gateway skips the heartbeat cycle to save API calls. If the file doesn't exist, the heartbeat still runs, but the agent must decide what to do on its own.
  • Self-modifying agents. You can ask the agent to update HEARTBEAT.md during normal conversation, or include "Update the checklist if it is outdated" in the heartbeat prompt.
  • No sensitive info. API Keys, phone numbers, etc., should not be in this file—it becomes part of the prompt.

Active Hours

activeHours performs timezone-aware filtering before a heartbeat triggers, supporting windows that cross midnight. Heartbeats outside the window are skipped, logged with the reason quiet-hours.

Common Patterns:

GoalConfiguration
Heartbeat only during work hoursactiveHours: { start: "09:00", end: "18:00" }
Run 24/7Omit activeHours (default behavior)
Avoid late-night disturbanceactiveHours: { start: "08:00", end: "24:00" }
[!WARNING] start and end cannot be equal (e.g., 08:00 to 08:00); this is treated as a zero-width window, and heartbeats will always be skipped.

8.3.6 Channel Visibility and the Event System

Visibility Control

Visibility controls determine whether heartbeat messages are actually sent to the channel:

YAML

channels:defaults:heartbeat:showOk: false        # Don't send HEARTBEAT_OK ACKs by defaultshowAlerts: true      # Send alert content by defaultuseIndicator: true    # Emit UI indicator events by defaulttelegram:heartbeat:showOk: true          # Show OK confirmations on Telegram specificallywhatsapp:accounts:work:heartbeat:showAlerts: false  # The "work" account does not receive alerts

When all three toggles are false, the gateway skips the heartbeat cycle entirely (without calling the LLM), which is the most economical "Total Silence" mode.

Event System

Every heartbeat operation emits an event for UI and monitoring consumption:

StatusMeaningTrigger Scenario
sentMessage DeliveredAlert content successfully sent to the channel
ok-emptyEmpty ReplyLLM had no output; optionally sends HEARTBEAT_OK
ok-tokenToken ACKReply contained only HEARTBEAT_OK; stripped
skippedSkippedalerts-disabled / duplicate / quiet-hours / no-target, etc.
failedFailedError during LLM call or delivery process

Indicator Type Mapping: ok-empty / ok-token → "ok" (Green); sent → "alert" (Yellow); failed → "error" (Red). In the WebChat Debug page, you can see JSON snapshots of the latest heartbeat events.

8.3.7 Manual Triggers and System Events

You don't have to wait for the next heartbeat cycle; you can trigger one immediately:

Bash

# Wake up heartbeat immediately

openclaw system event --text "Check for urgent follow-ups" --mode now


# Wait until the next heartbeat cycle to process

openclaw system event --text "Check project status" --mode next-heartbeat

If multiple agents have heartbeats configured, --mode now will trigger heartbeats for all of them immediately.

System events are not limited to manual input—Cron jobs and exec completions also generate events, which the heartbeat automatically incorporates during its next trigger. The system checks the pending event queue and generates specific prompts based on the event type (e.g., Cron reminder text is embedded directly into the prompt body; see 8.3.3 User Message Body).

8.3.8 Heartbeat vs Cron: Selection Decision Tree

Does it need to run at a precise time?

 Yes -> Use Cron

 No -> Continue...


Does it need to be isolated from the main session?

 Yes -> Use Cron (isolated session)

 No -> Continue...


Can it be merged with other periodic checks?

 Yes -> Use Heartbeat (add to HEARTBEAT.md)

 No -> Use Cron


Is it a one-time reminder?

 Yes -> Use Cron + --at

 No -> Continue...


Does it require a different model or reasoning depth?

 Yes -> Use Cron (isolated) + --model/--thinking

 No -> Use Heartbeat

Best Practice: Use both in tandem. Use Heartbeat for routine periodic inspections (inbox, calendar, notifications), completing multiple checks in one batch. Use Cron for independent jobs requiring precise timing (daily reports, weekly reviews, fixed reminders). This reduces API calls while maintaining time precision for critical tasks.

DimensionHeartbeatCron (main session)Cron (isolated session)
SessionMain SessionMain Session (via System Event)cron:<jobId> Independent Session
ContextFull HistoryFull HistoryStarts Blank
ModelOverridableMain Session ModelOverridable
OutputDeliver only if non-OKHeartbeat Prompt + EventSummary (announce) by default
Time PrecisionApproximate (Queue load)Precise (Sec-level cron)Precise
Token CostMulti-check per turnJoins next heartbeat (No extra turn)One full turn per job

Cost Control Recommendations

Heartbeats run full agent cycles; without control, token consumption can be significant. Several ways to reduce costs:

  • Streamline HEARTBEAT.md: The shorter the list, the fewer tokens injected per cycle.
  • Use lightContext: true: The heartbeat cycle only injects HEARTBEAT.md and skips full workspace bootstrap files.
  • Use Cheaper Models: Use heartbeat.model to assign a more economical model (like Haiku) for heartbeats while keeping high-end models for core conversations.
  • Set target: "none": If you only need internal state updates without outgoing messages, skip the delivery process.
  • Extend Intervals: every: "1h" results in half the calls compared to the default 30m.
  • Limit Active Hours: Skipping heartbeats at night can save roughly 1/3 of the total calls.

8.4 Remote Access: SSH, Intranet Penetration, and Zero Trust

This section discusses the goals of OpenClaw remote operation and maintenance (O&M): ensuring reachability while minimizing the attack surface and providing rapid-revocation access mechanisms.

[!NOTE] The tunneling, Zero Trust, and credential management discussed here are general infrastructure practices that must be implemented on the host or network level. OpenClaw itself does not include built-in remote access components.

8.4.1 Entry Principles: Management Plane Non-Exposure by Default

Remote O&M starts with controlling the exposure of entry points.

  • Do not open management ports to the public internet for extended periods.
  • Separate the management plane from the business plane to prevent high-privilege operations and external business traffic from sharing the same entry point.
  • Access chains must be auditable, providing a way to trace "who did what and when."

8.4.2 Recommended Topology: Dedicated Channels First

Commonly viable solutions include:

  1. Official Recommendation: Tailscale Networking with Serve/Funnel. The OpenClaw Gateway CLI integrates with Tailscale (--tailscale), allowing devices to establish secure peer-to-peer access without a public IP, complete with built-in certificate validation. Pair this with Tailscale Serve to securely publish the Dashboard.
  2. SSH Tunneling (Port Forwarding). For scenarios requiring only CLI troubleshooting or occasional Web Console access, use a bastion host or jump box to perform ssh -L 18789:localhost:18789 user@host forwarding for secure, lightweight local access.
  3. Temporary Port Forwarding and allowedOrigins. If you must bind to a non-loopback address to enable the Dashboard, you must explicitly declare gateway.controlUi.allowedOrigins in the config. Otherwise, the gateway security guardrails will refuse to start and will force-block cross-origin requests.
  4. Regardless of the choice, the core principle is: "Establish a trusted network first, then authorize management operations."

8.4.3 Identity and Permissions: Least Privilege and Short-Lived Credentials

For remote access, it is recommended to:

  • Disable password logins and use key-based or certificate-based authentication.
  • Assign minimum permissions based on roles; do not share administrator accounts.
  • Use short-lived credentials and rotate them regularly to reduce the blast radius if disclosed.

8.4.4 File Synchronization: Bringing Remote Results Back

When OpenClaw is deployed on a cloud server, the assets, reports, and code organized by the agent are stored on the remote disk. In engineering practice, it is recommended to establish a controlled two-way file synchronization mechanism to seamlessly map the "cloud workspace" to your local machine:

  1. rclone Remote Mount (Highly Recommended) Use rclone mount to mount the server's workspace directory directly as a local disk on Mac/Linux.
  • Pros: Free, real-time, native support; no need to install a sync daemon on the server (uses native SFTP).
  • Optimization Tips: Default mounts can be slow; we recommend adding --vfs-cache-mode full (for local caching), --sftp-connections 8 (for concurrency), and --sftp-idle-timeout 0 (to prevent timeouts).
  1. Enterprise Cloud Sync (e.g., Nutstore/Dropbox) Install clients on both the server and local machine, designating the workspace directory for two-way sync.
  • Pros: Extremely low barrier to entry; works out of the box.
  • Cons: High-frequency read/writes or large temporary file generation by the agent can quickly exhaust sync quotas; conflict resolution is often opaque to developers.
  1. Lightweight Web-based File Managers (File Browser) If you only need occasional mobile access to view files without heavy editing, you can run a standalone web-based file browser (like File Browser) on the server. Use this to access or download reports via a browser, following the security principles in 8.3.1.

8.4.5 Browser Remote Takeover: KasmVNC for Human-AI Collaboration

When an agent runs an L2 (headed browser) on the cloud for automation, it often encounters complex CAPTCHAs (sliders, behavioral verification) or requires manual confirmation. Relying solely on screenshots (L3) is often insufficient to bypass these security policies.

In practice, you can install a web-based remote desktop service (like KasmVNC) on the server.

  1. When the agent gets stuck at a CAPTCHA, it notifies the user.
  2. The user connects via a local browser to KasmVNC to take over the cloud mouse and keyboard.
  3. After helping the agent pass the CAPTCHA, the user closes the connection, and the agent continues its tool-based workflow.
  4. This "unattended by default, manual takeover when stuck" model is a practical compromise between automation and security.

8.4.6 Emergency Mechanisms: Rapid Revocation and Isolation

In the event of credential leakage or suspicious access, prioritize the following:

  1. Revoke affected credentials or device trust.
  2. Tighten entry policies; temporarily block management channels if necessary.
  3. Replay audit logs to confirm the scope of impact and verify recovery.
  4. Emergency actions should be practiced in advance to avoid making ad-hoc decisions during an actual incident.

8.4.7 Baseline Inspection Commands

The following commands can be used to quickly inspect the remote access baseline of a host:

Bash

# Check SSH authentication methods

sshd -T | grep -E 'passwordauthentication|pubkeyauthentication'# Check for non-local listening ports

lsof -nP -iTCP -sTCP:LISTEN | grep -v '127.0.0.1' || echo "No public listeners"

It is recommended to perform these inspections consistently after deployments or changes, archiving them alongside the output of doctor/status.

It is recommended to perform these inspections consistently after deployments or changes, archiving them alongside the output of doctor/status.

8.5 Security Baseline and Auditing Processes

The goal of a security baseline is not to write a static "configuration checklist," but to confine high-risk capabilities within deterministic boundaries and ensure every critical decision is traceable and reviewable. Based on the official OpenClaw security and configuration documentation, this section provides a practical end-to-end flow for security and auditing: how to define layered boundaries, how to record audit events, how to inject secrets, and how to use self-check commands to transform the baseline into a verifiable process.

8.5.1 Defense in Depth: Layering Boundaries and Verifying Each Layer

OpenClaw's official security model is built on the fundamental assumption of a "trusted operator boundary" (personal assistant model). It does not natively provide hard isolation against malicious multi-tenancy. If the gateway is subjected to the outside world without hardened security protections, the system's chain of trust becomes extremely fragile.

In an agent system, risk does not stem from a single entry point but from the combined pipeline of "Inbound Channel + Routing + Tools + Secrets + Memory." A Minimum Viable Product (MVP) for defense-in-depth can be split into four layers, each with verifiable control points:

  1. Entry Layer: Channel gating, group chat trigger rules, allowlists, and mention rules. Establish an extremely conservative default operation surface: it is recommended to force all non-primary sessions (e.g., group chats) into a sandbox (Docker Sandbox) by default, or even set workspaceAccess=none or read-only. This significantly reduces the risk of unauthorized credential harvesting caused by overly broad allowFrom or dmPolicy settings.
  2. Routing Layer: Use priority bindings to fix high-risk entry points to controlled agents, preventing routing correctness from becoming a matter of probability. See [7.4 Routing Basics].
  3. Tool Layer: Use Tool Profiles for allow, deny, and tiered strategies. Deny rules take precedence over allow rules to prevent privilege escalation caused by overlapping configurations. See [5.2 Tool Policy].
  4. Runtime & Data Layer: Switch secrets to environment injection (see [4.2 Provider Access]), enable log redaction, save diagnostic output to manageable paths, and strictly control retention periods and access permissions.
  5. Security Layering and Evidence Chain Diagram:
  6. {% @mermaid/diagram content="flowchart TB
  7. subgraph L1["Entry Layer"]
  8. A1["Channel allowlist / pairing"]
  9. A2["Group mention gating"]
  10. end
  11. subgraph L2["Routing Layer"]
  12. B1["bindings -> agentId"]
  13. B2["dmScope / identityLinks"]
  14. end
  15. subgraph L3["operation Layer"]
  16. C1["tools.profile + allow/deny"]
  17. C2["sandbox mode + workspaceAccess"]
  18. C3["elevated exec controls"]
  19. end
  20. subgraph L4["Data & Secret Layer"]
  21. D1["SecretRef / .env precedence"]
  22. D2["logging redact"]
  23. D3["config hot reload boundaries"]
  24. D4["OS Permissions = Trust Boundary"]
  25. end
  26. L1 --> L2 --> L3 --> L4
  27. V["Verification Loop: doctor/status/health + security audit"] --> L1
  28. V --> L2
  29. V --> L3
  30. V --> L4" %}
  31. To verify if these layers are actually effective, prioritize system self-checks and state probing over "testing it out" on production entry points. The recommended minimal verification suite is:
  32. Bash

openclaw doctor

openclaw status --deep

openclaw channels status --probe

openclaw models status --check

openclaw secrets audit

openclaw security audit

Note: The availability of certain CLI commands (e.g., secrets audit and security audit) depends on your version. If a command is missing, consult the official documentation for your current version or use openclaw --help.

8.5.2 The Audit Event Quadruple: Structured Causal Chains

The core of auditing is answering four questions: Who, when, via which entry point, did what—and why did the system allow or deny it? To prevent logs from becoming unsearchable text heaps, it is recommended to abstract key actions into structured audit events with four fixed dimensions:

  1. Subject: User, peer identifier, channel account, and the target Agent ID.
  2. Action: Tool name, parameter summary, and target resource identifier.
  3. Basis: Hit bindings, hit tool policy rules, and rejection reasons.
  4. Result: Success, Failure, Rejected, or Requires Manual Confirmation.
  5. OpenClaw supports structured logging and log-following. When troubleshooting "why something was denied" or "why it routed to a specific agent," follow the log stream and filter by fields. Below is an example of extracting event summaries related to routing and tool calls from JSON logs:
  6. Bash

openclaw logs --follow --json | jq -c 'select(.type=="log") | .log | select(.event=="routed" or .event=="tool_call" or .event=="tool_denied") | {ts, trace_id, event, agentId, channelId, peerId, tool, reason}'

Note: Actual field names should be verified against the raw logs --json output. It is recommended to print a raw log entry without filters first to confirm whether the format uses trace_id or traceId before applying jq filters.

If you find that "the same entry point triggers different capabilities at different times," prioritize checking for multi-account configurations, group chat policy discrepancies, or boundary drift caused by overlapping tool policies.

8.5.3 Secrets and Blast Radius: Environment Injection and Least Privilege

The goal of secret governance is to limit the usable scope and duration of a credential after a potential leak. OpenClaw configurations support retrieving secrets via ${VAR} or SecretRef objects, avoiding the storage of plaintext keys in configuration files or code repositories.

[!NOTE] OpenClaw reads variable references from process environment variables, the current directory's .env, and ~/.openclaw/.env, following the priority rule that ".env does not override existing environment variables."

The following example shows common secret injection patterns: simple interpolation or using SecretRef to integrate with unified credential pipelines. It is recommended to use different keyIds for different environments during deployment.

JavaScript

{

 models: {

   providers: {

     openai: {

       keys: {

         // Method 1: String interpolationprod: { apiKey: "${OPENAI_API_KEY}" },

         // Method 2: Secure SecretRef specificationstaging: { apiKey: { source: "env", id: "OPENAI_API_KEY_STAGING" } }

       }

     }

   }

 }

}

Controlling the blast radius typically requires three "hard rules":

  1. Entry Isolation: Use different policies and default agents for group chats versus DMs to prevent low-trust entry points from directly triggering write capabilities.
  2. Permission Isolation: High-risk tools should only be assigned to a small group of controlled agents, with deny rules as a backstop.
  3. Secret Isolation: Use different keys or credentials with different permission scopes for different capabilities to avoid a "one key to rule them all" scenario.

8.5.4 Anti-Pattern: The Agent Self-Modification Infinite Loop

Even with the isolation above, serious engineering disasters can occur if core configuration files (e.g., openclaw.json) are not protected against read/write access by the agent.

[!CAUTION] Beware of "AI Repairing Itself to Failure": In real-world cases, users have granted agents global filesystem read/write permissions. When a minor glitch occurred, the agent triggered a "self-healing" hallucination and began modifying its own openclaw.json.Because the agent did not fully understand the parameter schema, it corrupted the configuration. OpenClaw detected the change and auto-restarted, only to crash immediately due to the config error. If a process supervisor (like systemd) repeatedly restarts it, the system enters an infinite error loop, potentially consuming massive log space or API quotas.

Protection Recommendations:

  1. In production, never disable user-level permission restrictions. The account running OpenClaw should not have write permissions to the directory where the configuration files reside (this should be managed by a deployment pipeline or a separate admin account).
  2. Control hot-reload boundaries and behaviors. Refer to official settings:
ConceptConfig FieldDefaultDescription
Hot Reload Modegateway.reload.modehybridOptions: hybrid (auto-detect reload vs restart), hot, restart, off. debounceMs defaults to 300ms.

In openclaw.json, you can set gateway.reload.mode: 'hot' or 'off' to completely disable file watching. This prevents "modification triggering frequent abnormal restarts."

  1. Explicitly ban the configuration path in the tools.deny list.
  2. If you truly need to modify configurations for troubleshooting, use traditional O&M configuration deployment methods rather than allowing OpenClaw to modify itself.

8.5.5 Drills and Regression: Turning the Baseline into a Repeatable Cycle

A security baseline that isn't drilled will eventually fail during changes. It is recommended to split verification into two categories:

  1. Self-Check Regression: After every release or configuration change, run a fixed set of commands to ensure core pipelines are functional and boundaries haven't drifted.
  2. Fault Injection: In a controlled environment, intentionally introduce invalid keys, network disconnections, or channel unavailability to observe whether the system provides clear error messages and graceful degradation.
  3. Start with a minimal regression suite:
  4. Bash

openclaw doctor

openclaw health --json

openclaw status --deep

openclaw security audit

openclaw secrets reload

Version Tip: The name and parameters for openclaw secrets reload may vary by version. If the command fails, use openclaw --help to get the correct syntax.

When an anomaly is found, use doctor and status to determine if it's a config, dependency, or runtime state issue, then use the traceId from the logs to replay and locate the problem. For logs involving sensitive info, ensure redaction is enabled as per official security advice

8.6 Chapter Summary

This chapter has focused on automated operations and security baselines, providing key practices to advance OpenClaw from "functional" to a state that is "controlled, auditable, and replayable."

8.6.1 Key Conclusions

  1. Hooks for Decoupling Governance: The three stages (Input, Operation, Output) each have distinct boundaries. Hooks themselves must be constrained by timeouts, degradation strategies, and idempotency.
  2. Scheduled Jobs Grounded in Idempotency and Re-entrancy Protection: Repeated operations must not produce duplicate side effects, and failures should be shunted by error type.
  3. Heartbeat Based on "Notify on Event" Principle: A single heartbeat cycle batch-processes multiple inspection items. The HEARTBEAT_OK protocol prevents information bombardment, while active hours and visibility controls further reduce noise.
  4. Remote Access Based on Minimal Exposure: The management plane should remain private by default, with credentials that are rapidly revocable and an access chain that is fully auditable.
  5. Security Baselines Centered on Layered Boundaries and Verifiable Processes: Use doctor, status --deep, and logs --follow --json to transform troubleshooting from an experience-based task into a verifiable workflow.

8.6.2 Reader Self-Check

  • Have you defined timeout and degradation policies for critical Hooks and verified their performance under abnormal traffic?
  • Do your scheduled jobs possess idempotency keys and re-entrancy mechanisms? Is there a clear path for shunting and escalating failures?
  • Is your heartbeat checklist (HEARTBEAT.md) sufficiently streamlined? Have you configured activeHours to avoid late-night disturbances?
  • Does your remote management entry point feature strong authentication with rapid revocation capabilities? Is the access chain auditable?

8.6.3 Community Practice Inspirations

Only by adding automated operations can the system truly run "unattended." For example:

  • Self-healing Server Control Nodes: Use Hooks to listen for lifecycle events of underlying containers or hosts. Upon capturing anomalies (e.g., memory spikes, unresponsive services), immediately trigger automated degradation or restart scripts.
  • Prediction Market Automated Following: Utilize high-frequency scheduled job dispatching to poll specific market data APIs, automatically calling executors to complete logic when strategy indicators are met.
  • Low-code Platform Orchestration: Wrap OpenClaw capabilities as Webhooks and combine them with low-code workflow engines like n8n to drive entire enterprise-grade automation pipelines.

With the completion of Chapter 8, you have successfully transitioned from an observer to a proficient practitioner of the OpenClaw framework. Over these eight chapters, we have moved from the initial setup of the Five Core Objects to the sophisticated orchestration of Multi-Agent Collaboration and Automated O&M practices. You now possess the foundational knowledge and technical skills to build autonomous "digital employees" that can handle real-world workflows. This concludes the primary guide on the core application and advanced usage of OpenClaw. You are now fully equipped to deploy and optimize your own intelligent agents.


About the author

Sarah Jenkins
Sarah Jenkins

Sarah Jenkins is a seasoned OpenClaw developer with a strong focus on optimizing high-performance computing solutions. Her work primarily involves crafting efficient parallel algorithms and enhancing GPU acceleration for complex scientific simulations. Jenkins is renowned for her meticulous attention to detail and her ability to translate intricate theoretical concepts into practical, robust OpenClaw implementations.

View Full Profile