MetaClaw is a self-evolving AI agent framework that transforms every real-world conversation into a learning signal. Built on SkillRL, it enables continuous online reinforcement learning — with zero GPU clusters required.
A new paradigm for building AI agents that learn in the wild
MetaClaw is an open-source AI agent framework developed by AIMING Lab at the University of North Carolina at Chapel Hill. At its core, MetaClaw enables AI agents to achieve self-evolution and meta-learning by inserting a transparent proxy between users and large language models (LLMs). Every daily conversation becomes a learning signal, allowing agents to continuously improve during real-world deployment rather than relying solely on offline training.
Traditional reinforcement learning approaches require massive annotated datasets and expensive GPU clusters. MetaClaw eliminates these barriers. Through its innovative architecture, agents accumulate experience from real-time user interactions, automatically extract and inject new skills, and achieve continuous capability improvement — all without pre-built datasets or dedicated GPU infrastructure.
The MetaClaw framework combines online reinforcement learning, skill injection, and intelligent scheduling into a unified system, providing a fundamentally new paradigm for building adaptive, autonomous AI agents that grow smarter with every interaction.
What makes MetaClaw different from other AI agent frameworks
Get started instantly with metaclaw setup for interactive configuration and metaclaw start to launch. Choose your LLM provider (Kimi, Qwen, MiniMax, or custom), enter your API key, and optionally enable reinforcement learning. MetaClaw automatically configures OpenClaw and restarts the gateway.
During every user conversation, MetaClaw intercepts interactions and injects relevant skills from its hierarchical SkillBank to enhance agent responses. After each session, the system automatically analyzes dialogue content, extracts new reusable skills, and stores them. When RL is enabled, a dedicated evolver LLM further extracts skills from failed conversations.
Service responses, reward modeling, and model training are fully decoupled. Even during complex optimization, the agent remains responsive. In madmax mode, the intelligent scheduler detects idle time windows (sleep hours, keyboard inactivity, calendar meetings) and performs model weight updates only during those periods.
MetaClaw supports online reinforcement learning from live conversations. Each dialogue is tagged and submitted as a training sample. A discriminative LLM (PRM) asynchronously scores responses, while Tinker or MinT backends run LoRA fine-tuning and hot-swap model weights. Optional Online Policy Distillation (OPD) transfers capabilities from a larger teacher model to a student model via KL penalty.
Unlike traditional reinforcement learning pipelines that demand expensive GPU clusters, MetaClaw operates without dedicated GPU infrastructure. This dramatically reduces deployment and operational costs, making self-evolving AI agents accessible to individual developers, small teams, and organizations of any scale.
MetaClaw operates as an OpenAI-compatible proxy server that transparently intercepts user-LLM interactions. This design means existing applications can integrate MetaClaw's self-evolution capabilities with minimal code changes, leveraging the standard API interface that most LLM-based applications already support.
Modular design for flexible deployment across different resource profiles
MetaClaw is built on top of OpenClaw, a core agent framework, and operates through an OpenAI-compatible proxy server that intercepts and processes interaction data. The system employs a fully asynchronous architecture where service responses, reward modeling, and model training are completely decoupled. This ensures that scoring and optimization tasks run in parallel without impacting user experience.
The reinforcement learning backend supports Tinker (a cloud-based LoRA training service) and MinT (a compatible open-source backend), giving users flexible options for model training infrastructure.
MetaClaw provides three operating modes to accommodate different use cases and resource constraints:
| Mode | Capabilities | Resource Requirements |
|---|---|---|
skills_only |
Skill injection and automatic summarization only. No model training. | Lightweight. No GPU needed. |
rl |
Everything in skills_only, plus online reinforcement learning training. |
Requires RL backend (Tinker or MinT). |
madmax Default |
Everything in rl, plus intelligent scheduler that trains only during user idle time. |
RL backend + optional calendar integration. |
Recursive skill-augmented reinforcement learning for evolving agents
At the heart of MetaClaw lies SkillRL (Recursive Skill-Augmented Reinforcement Learning), a novel framework that enables agents to evolve their capabilities through structured skill management. SkillRL was introduced in the research paper "SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning" by Xia et al.[3]
The key innovation of SkillRL is its ability to extract high-level, reusable behavioral patterns — called "skills" — from raw experience. These skills are stored in a hierarchical SkillBank. When the agent faces a new task, the system adaptively retrieves relevant skills and injects them into the prompt, significantly improving problem-solving capability and efficiency.
This approach mirrors how human professionals develop expertise: by accumulating patterns from experience, organizing them into transferable knowledge, and applying them to new situations. SkillRL provides the formal framework that makes this possible for AI agents, enabling them to move beyond static prompt engineering toward dynamic, experience-driven improvement.
The recursive nature of SkillRL means that skills themselves can generate higher-order skills, creating a compounding effect where the agent's capability grows increasingly sophisticated over time. Combined with MetaClaw's online learning architecture, this creates a virtuous cycle of real-world interaction, skill extraction, and capability enhancement.
How MetaClaw stands apart in the AI agent landscape
Unlike frameworks that rely on pre-built datasets or synthetic training environments, MetaClaw learns from real user conversations as they naturally occur. This "in-the-wild" approach means the agent continuously adapts to dynamic, real-world conditions and evolving user needs — a capability that static training pipelines cannot match.
By eliminating the need for GPU clusters, MetaClaw dramatically lowers the barrier to entry for self-evolving AI agents. Individual researchers, startups, and small teams can deploy continuously learning agents without the capital expenditure typically associated with reinforcement learning at scale.
The intelligent scheduler in madmax mode resolves the fundamental tension between online learning and user experience. By restricting model updates to idle periods, MetaClaw ensures that the agent's evolution never disrupts its primary function of serving users effectively.
The SkillRL framework provides a principled approach to knowledge accumulation and reuse. Rather than treating each interaction as isolated, MetaClaw systematically extracts, organizes, and leverages learned skills, creating compounding returns on every conversation.
Developed by leading researchers in adaptive AI systems
MetaClaw is developed and maintained by AIMING Lab at the University of North Carolina at Chapel Hill (UNC-Chapel Hill), led by Professor Huaxiu Yao.
Professor Huaxiu Yao serves as Assistant Professor in the Department of Computer Science and the School of Data Science and Society at UNC-Chapel Hill. Prior to joining UNC, he was a Postdoctoral Researcher at the Stanford AI Laboratory. His research focuses on building generalized and adaptive agent foundation models, spanning both theoretical foundations and practical applications.
The name "AIMING" reflects the lab's mission: achieving Adaptive Intelligence through Alignment, Interaction, and Learning. The lab is dedicated to advancing how AI agents learn and evolve in real-world environments, and MetaClaw represents a direct embodiment of this research agenda.
The team's deep expertise in reinforcement learning, large language models, and AI agent systems provides the academic rigor and technical depth that underpin MetaClaw's innovations. Their published research, including the SkillRL paper, establishes the theoretical foundations upon which the framework is built.
Deploy MetaClaw in minutes
Run metaclaw setup to start the interactive configuration wizard. Select your LLM provider, enter your API key, and choose whether to enable reinforcement learning training.
Run metaclaw start to boot the proxy. MetaClaw automatically configures OpenClaw and restarts the gateway. Your agent is now live and learning.
Use your agent as you normally would. MetaClaw transparently intercepts interactions, injects relevant skills, and — when RL is enabled — continuously trains your model in the background.