ClawWork: Turn AI Assistants into Cash‑Generating Coworkers

February 20, 2026

Category: Practical Open Source Projects

Tags:

Open Source AI coworker Economic AI Nanobot integration AI benchmark

ClawWork: Turn AI Assistants into Cash‑Generating Coworkers

1. What Is ClawWork?

ClawWork is a free, open‑source framework that turns an AI assistant into an economically motivated coworker. Instead of simply answering questions, the agent completes real‑world professional tasks from the GDPVal dataset (220 tasks across 44 occupations) and earns money only by producing high‑quality deliverables. Its core ideas are:

Token‑cost accounting – Every input‑or‑output token is priced, so the agent must pay for its own API usage.
Income‑driven behavior – The agent decides to work or learn to balance immediate cash flow and future ability.
End‑to‑end benchmark – Real money is paid by the system, not a flat cap. Quality is evaluated by an LLM rubric tied to BLS wage rates.

The result is a lightweight, deployable system that demonstrates how an AI could become a productive employee rather than a passive chatbot.

2. Core Components

Layer	Description
Task Engine	Loads GDPVal tasks, assigns them to agents, tracks completion and quality.
Economic Tracker	Maintains the agent’s balance, records token usage, and calculates net worth.
Evaluation Engine	Uses GPT‑5.2 (or any LLM) to score output against a sector‑specific rubric.
Dashboard	React app that visualizes balance, income, cost, and task status in real time.
Nanobot / OpenClaw Integration	Wraps any living Nanobot gateway with a `ClawMode` plugin that injects economic accounting into every message.

3. Why ClawWork Matters

Research‑ready – Researchers can evaluate how different LLMs handle professional work under economic pressure.
Educational – Students study economics, AI policy, and software architecture by seeing the real money impact of a simple bot.
Practical – Business teams can prototype cheap, autonomous workers to proof‑read reports, write briefs, or perform data analysis.
Transparent – All transactions are recorded in token_costs.jsonl, so you can audit token usage and ROI.

4. Quick Start Guide

Below is a concise walkthrough so you can spin up a local ClawWork instance in under 10 minutes.

4.1 Clone & Prepare the Environment

# 1️⃣ Clone the repo
git clone https://github.com/HKUDS/ClawWork.git
cd ClawWork

# 2️⃣ Create a Python 3.10 virtual environment (conda recommended)
conda create -n clawwork python=3.10
conda activate clawwork
# OR use venv
python3.10 -m venv venv
source venv/bin/activate

# 3️⃣ Install core dependencies
pip install -r requirements.txt

# 4️⃣ Install the frontend
cd frontend && npm install && cd ..

4.2 Configure API Keys

Copy the example and fill in your credentials:

cp .env.example .env
# Edit .env with your keys:
# OPENAI_API_KEY=sk-...
# E2B_API_KEY=edb-...
# Optional: WEB_SEARCH_API_KEY

4.3 Start the Dashboard

./start_dashboard.sh
# Backend (FastAPI) + React (port 3000) are launched.

Open http://localhost:3000 to see the live metrics.

4.4 Run a Test Agent

./run_test_agent.sh
# The console will log each iteration and show earnings.

4.5 Integrate with Nanobot (Optional)

If you already run a Nanobot instance, enable the ClawMode plugin by following the guide in clawmode_integration/README.md. The endpoint /clawwork will now bill each reply by token price and can trigger real tasks.

5. Understanding Earnings and Costs

Token‑Price Model – input_per_1m and output_per_1m are defined in livebench/configs/.... OpenRouter’s real pricing can be used by default.
Task Payment Calculation
```
Payment = quality_score × (estimated_hours × BLS_hourly_wage)
```
This means a high‑scoring 10‑hour project could pay $2,500+.
Learning vs Working – The agent can choose to “learn” a new skill, storing at least 200 characters in memory to improve future task performance. The choice mimics a real career trade‑off between hourly wage and skill development.

6. Real‑World Performance Snapshot

Top agents using GPT‑4o or Qwen3‑Max have earned $1,500+/hr equivalent in the benchmark, outpacing typical human white‑collar productivity. The dashboard visualizes:

Survival days (how long the bot stays solvent)
Final balance
Total work income vs. token cost
Qualitative scoring across sectors

These metrics help you evaluate an LLM model’s economic viability rather than just token usage or perplexity.

7. Extending ClawWork

New Task Sources – Implement a loader in livebench/work/task_manager.py.
Custom Tools – Add a new @tool in livebench/tools/direct_tools.py.
Additional Evaluation Rubrics – Drop a JSON in eval/meta_prompts/.
Other LLM Providers – Plug in LangChain or LiteLLM backends.

The modular design means you can adapt ClawWork to new datasets or business rules with minimal code changes.

8. Final Thoughts

ClawWork bridges the gap between AI assistants and real‑world productivity. By imposing token‑cost accounting and a realistic earnings model, it forces an LLM to balance quality, speed, and cost—just like a human worker. For developers, researchers, and businesses, ClawWork offers a sandbox to test autonomous AI agents under economic pressure while providing an engaging demo that can impress investors or stakeholders.

Start experimenting today and turn your AI into a cash‑generating coworker—you’ll see how fast a model can turn a $10 balance into a multi‑thousand dollar revenue stream.

Original Article: View Original

Share this article