ClawWork: Turn AI Assistants into Cash‑Generating Coworkers
ClawWork: Turn AI Assistants into Cash‑Generating Coworkers
1. What Is ClawWork?
ClawWork is a free, open‑source framework that turns an AI assistant into an economically motivated coworker. Instead of simply answering questions, the agent completes real‑world professional tasks from the GDPVal dataset (220 tasks across 44 occupations) and earns money only by producing high‑quality deliverables. Its core ideas are:
- Token‑cost accounting – Every input‑or‑output token is priced, so the agent must pay for its own API usage.
- Income‑driven behavior – The agent decides to work or learn to balance immediate cash flow and future ability.
- End‑to‑end benchmark – Real money is paid by the system, not a flat cap. Quality is evaluated by an LLM rubric tied to BLS wage rates.
The result is a lightweight, deployable system that demonstrates how an AI could become a productive employee rather than a passive chatbot.
2. Core Components
| Layer | Description |
|---|---|
| Task Engine | Loads GDPVal tasks, assigns them to agents, tracks completion and quality. |
| Economic Tracker | Maintains the agent’s balance, records token usage, and calculates net worth. |
| Evaluation Engine | Uses GPT‑5.2 (or any LLM) to score output against a sector‑specific rubric. |
| Dashboard | React app that visualizes balance, income, cost, and task status in real time. |
| Nanobot / OpenClaw Integration | Wraps any living Nanobot gateway with a ClawMode plugin that injects economic accounting into every message. |
3. Why ClawWork Matters
- Research‑ready – Researchers can evaluate how different LLMs handle professional work under economic pressure.
- Educational – Students study economics, AI policy, and software architecture by seeing the real money impact of a simple bot.
- Practical – Business teams can prototype cheap, autonomous workers to proof‑read reports, write briefs, or perform data analysis.
- Transparent – All transactions are recorded in
token_costs.jsonl, so you can audit token usage and ROI.
4. Quick Start Guide
Below is a concise walkthrough so you can spin up a local ClawWork instance in under 10 minutes.
4.1 Clone & Prepare the Environment
# 1️⃣ Clone the repo
git clone https://github.com/HKUDS/ClawWork.git
cd ClawWork
# 2️⃣ Create a Python 3.10 virtual environment (conda recommended)
conda create -n clawwork python=3.10
conda activate clawwork
# OR use venv
python3.10 -m venv venv
source venv/bin/activate
# 3️⃣ Install core dependencies
pip install -r requirements.txt
# 4️⃣ Install the frontend
cd frontend && npm install && cd ..
4.2 Configure API Keys
Copy the example and fill in your credentials:
cp .env.example .env
# Edit .env with your keys:
# OPENAI_API_KEY=sk-...
# E2B_API_KEY=edb-...
# Optional: WEB_SEARCH_API_KEY
4.3 Start the Dashboard
./start_dashboard.sh
# Backend (FastAPI) + React (port 3000) are launched.
4.4 Run a Test Agent
./run_test_agent.sh
# The console will log each iteration and show earnings.
4.5 Integrate with Nanobot (Optional)
If you already run a Nanobot instance, enable the ClawMode plugin by following the guide in clawmode_integration/README.md. The endpoint /clawwork will now bill each reply by token price and can trigger real tasks.
5. Understanding Earnings and Costs
- Token‑Price Model –
input_per_1mandoutput_per_1mare defined inlivebench/configs/.... OpenRouter’s real pricing can be used by default. -
Task Payment Calculation
This means a high‑scoring 10‑hour project could pay $2,500+.Payment = quality_score × (estimated_hours × BLS_hourly_wage) -
Learning vs Working – The agent can choose to “learn” a new skill, storing at least 200 characters in memory to improve future task performance. The choice mimics a real career trade‑off between hourly wage and skill development.
6. Real‑World Performance Snapshot
Top agents using GPT‑4o or Qwen3‑Max have earned $1,500+/hr equivalent in the benchmark, outpacing typical human white‑collar productivity. The dashboard visualizes:
- Survival days (how long the bot stays solvent)
- Final balance
- Total work income vs. token cost
- Qualitative scoring across sectors
These metrics help you evaluate an LLM model’s economic viability rather than just token usage or perplexity.
7. Extending ClawWork
- New Task Sources – Implement a loader in
livebench/work/task_manager.py. - Custom Tools – Add a new
@toolinlivebench/tools/direct_tools.py. - Additional Evaluation Rubrics – Drop a JSON in
eval/meta_prompts/. - Other LLM Providers – Plug in LangChain or LiteLLM backends.
The modular design means you can adapt ClawWork to new datasets or business rules with minimal code changes.
8. Final Thoughts
ClawWork bridges the gap between AI assistants and real‑world productivity. By imposing token‑cost accounting and a realistic earnings model, it forces an LLM to balance quality, speed, and cost—just like a human worker. For developers, researchers, and businesses, ClawWork offers a sandbox to test autonomous AI agents under economic pressure while providing an engaging demo that can impress investors or stakeholders.
Start experimenting today and turn your AI into a cash‑generating coworker—you’ll see how fast a model can turn a $10 balance into a multi‑thousand dollar revenue stream.