What Is MetaClaw?

A new paradigm for building AI agents that learn in the wild

MetaClaw is an open-source AI agent framework developed by AIMING Lab at the University of North Carolina at Chapel Hill. At its core, MetaClaw enables AI agents to achieve self-evolution and meta-learning by inserting a transparent proxy between users and large language models (LLMs). Every daily conversation becomes a learning signal, allowing agents to continuously improve during real-world deployment rather than relying solely on offline training.

Traditional reinforcement learning approaches require massive annotated datasets and expensive GPU clusters. MetaClaw eliminates these barriers. Through its innovative architecture, agents accumulate experience from real-time user interactions, automatically extract and inject new skills, and achieve continuous capability improvement — all without pre-built datasets or dedicated GPU infrastructure.

The MetaClaw framework combines online reinforcement learning, skill injection, and intelligent scheduling into a unified system, providing a fundamentally new paradigm for building adaptive, autonomous AI agents that grow smarter with every interaction.

Key Features

What makes MetaClaw different from other AI agent frameworks

One-Command Deployment

Get started instantly with metaclaw setup for interactive configuration and metaclaw start to launch. Choose your LLM provider (Kimi, Qwen, MiniMax, or custom), enter your API key, and optionally enable reinforcement learning. MetaClaw automatically configures OpenClaw and restarts the gateway.

Automatic Skill Injection & Evolution

During every user conversation, MetaClaw intercepts interactions and injects relevant skills from its hierarchical SkillBank to enhance agent responses. After each session, the system automatically analyzes dialogue content, extracts new reusable skills, and stores them. When RL is enabled, a dedicated evolver LLM further extracts skills from failed conversations.

Async Processing & Smart Scheduling

Service responses, reward modeling, and model training are fully decoupled. Even during complex optimization, the agent remains responsive. In madmax mode, the intelligent scheduler detects idle time windows (sleep hours, keyboard inactivity, calendar meetings) and performs model weight updates only during those periods.

Online RL & Teacher Distillation

MetaClaw supports online reinforcement learning from live conversations. Each dialogue is tagged and submitted as a training sample. A discriminative LLM (PRM) asynchronously scores responses, while Tinker or MinT backends run LoRA fine-tuning and hot-swap model weights. Optional Online Policy Distillation (OPD) transfers capabilities from a larger teacher model to a student model via KL penalty.

Zero GPU Cluster Requirement

Unlike traditional reinforcement learning pipelines that demand expensive GPU clusters, MetaClaw operates without dedicated GPU infrastructure. This dramatically reduces deployment and operational costs, making self-evolving AI agents accessible to individual developers, small teams, and organizations of any scale.

OpenAI-Compatible Proxy

MetaClaw operates as an OpenAI-compatible proxy server that transparently intercepts user-LLM interactions. This design means existing applications can integrate MetaClaw's self-evolution capabilities with minimal code changes, leveraging the standard API interface that most LLM-based applications already support.

Architecture & Operating Modes

Modular design for flexible deployment across different resource profiles

Core Architecture

MetaClaw is built on top of OpenClaw, a core agent framework, and operates through an OpenAI-compatible proxy server that intercepts and processes interaction data. The system employs a fully asynchronous architecture where service responses, reward modeling, and model training are completely decoupled. This ensures that scoring and optimization tasks run in parallel without impacting user experience.

The reinforcement learning backend supports Tinker (a cloud-based LoRA training service) and MinT (a compatible open-source backend), giving users flexible options for model training infrastructure.

Three Operating Modes

MetaClaw provides three operating modes to accommodate different use cases and resource constraints:

Mode	Capabilities	Resource Requirements
`skills_only`	Skill injection and automatic summarization only. No model training.	Lightweight. No GPU needed.
`rl`	Everything in `skills_only`, plus online reinforcement learning training.	Requires RL backend (Tinker or MinT).
`madmax` Default	Everything in `rl`, plus intelligent scheduler that trains only during user idle time.	RL backend + optional calendar integration.

SkillRL: The Engine Behind MetaClaw

Recursive skill-augmented reinforcement learning for evolving agents

At the heart of MetaClaw lies SkillRL (Recursive Skill-Augmented Reinforcement Learning), a novel framework that enables agents to evolve their capabilities through structured skill management. SkillRL was introduced in the research paper "SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning" by Xia et al.[3]

The key innovation of SkillRL is its ability to extract high-level, reusable behavioral patterns — called "skills" — from raw experience. These skills are stored in a hierarchical SkillBank. When the agent faces a new task, the system adaptively retrieves relevant skills and injects them into the prompt, significantly improving problem-solving capability and efficiency.

This approach mirrors how human professionals develop expertise: by accumulating patterns from experience, organizing them into transferable knowledge, and applying them to new situations. SkillRL provides the formal framework that makes this possible for AI agents, enabling them to move beyond static prompt engineering toward dynamic, experience-driven improvement.

The recursive nature of SkillRL means that skills themselves can generate higher-order skills, creating a compounding effect where the agent's capability grows increasingly sophisticated over time. Combined with MetaClaw's online learning architecture, this creates a virtuous cycle of real-world interaction, skill extraction, and capability enhancement.

Competitive Differentiation

How MetaClaw stands apart in the AI agent landscape

"In-the-Wild" Self-Evolution

Unlike frameworks that rely on pre-built datasets or synthetic training environments, MetaClaw learns from real user conversations as they naturally occur. This "in-the-wild" approach means the agent continuously adapts to dynamic, real-world conditions and evolving user needs — a capability that static training pipelines cannot match.

Accessible Infrastructure

By eliminating the need for GPU clusters, MetaClaw dramatically lowers the barrier to entry for self-evolving AI agents. Individual researchers, startups, and small teams can deploy continuously learning agents without the capital expenditure typically associated with reinforcement learning at scale.

User-Experience-First Training

The intelligent scheduler in madmax mode resolves the fundamental tension between online learning and user experience. By restricting model updates to idle periods, MetaClaw ensures that the agent's evolution never disrupts its primary function of serving users effectively.

Structured Skill Management via SkillRL

The SkillRL framework provides a principled approach to knowledge accumulation and reuse. Rather than treating each interaction as isolated, MetaClaw systematically extracts, organizes, and leverages learned skills, creating compounding returns on every conversation.

Team & Research Background

Developed by leading researchers in adaptive AI systems

MetaClaw is developed and maintained by AIMING Lab at the University of North Carolina at Chapel Hill (UNC-Chapel Hill), led by Professor Huaxiu Yao.

Professor Huaxiu Yao serves as Assistant Professor in the Department of Computer Science and the School of Data Science and Society at UNC-Chapel Hill. Prior to joining UNC, he was a Postdoctoral Researcher at the Stanford AI Laboratory. His research focuses on building generalized and adaptive agent foundation models, spanning both theoretical foundations and practical applications.

The name "AIMING" reflects the lab's mission: achieving Adaptive Intelligence through Alignment, Interaction, and Learning. The lab is dedicated to advancing how AI agents learn and evolve in real-world environments, and MetaClaw represents a direct embodiment of this research agenda.

The team's deep expertise in reinforcement learning, large language models, and AI agent systems provides the academic rigor and technical depth that underpin MetaClaw's innovations. Their published research, including the SkillRL paper, establishes the theoretical foundations upon which the framework is built.

Getting Started

Deploy MetaClaw in minutes

1

Configure

Run metaclaw setup to start the interactive configuration wizard. Select your LLM provider, enter your API key, and choose whether to enable reinforcement learning training.

2

Launch

Run metaclaw start to boot the proxy. MetaClaw automatically configures OpenClaw and restarts the gateway. Your agent is now live and learning.

3

Converse & Evolve

Use your agent as you normally would. MetaClaw transparently intercepts interactions, injects relevant skills, and — when RL is enabled — continuously trains your model in the background.

References

aiming-lab/MetaClaw. Just talk to your agent — it learns and EVOLVES. GitHub. github.com/aiming-lab/MetaClaw
ShowAPI. MetaClaw: Innovative Online Reinforcement Learning System for AI. showapi.com
Xia, P., Chen, J., Wang, H., Liu, J., Zeng, K., Wang, Y., Han, S., Zhou, Y., Zhao, X., Chen, H., Zheng, Z., Xie, C., & Yao, H. (2026). SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning. arXiv. arxiv.org/html/2602.08234v1
Huaxiu Yao. Personal Website. huaxiuyao.io
UNC School of Data Science and Society. Huaxiu Yao. datascience.unc.edu
Google Scholar. Huaxiu Yao. scholar.google.com
Huaxiu Yao. AIMING Lab. huaxiuyao.io/aiming-lab

MetaClaw: Self-Evolving AI Agents Through Conversation