agent-browser: Fast Rust CLI for AI Browser Automation
agent-browser: Fast Rust CLI for AI Browser Automation
Why a New Tool for Browser Automation?
AI agents are increasingly required to interact with real‑world web pages: testing UI flows, scraping dynamic content, or crawling e‑commerce sites. Existing tools like Selenium or Playwright are fantastic libraries, but they demand a learning curve, setup overhead, and often a heavy runtime. If an AI system (Claude, GPT‑4, Gemini, etc.) were to control a browser, it would be convenient to expose that capability via a lightweight, command‑line interface that can be called from any language.
Enter agent-browser – a single executable that ships a fast Rust binary on macOS, Linux, and Windows, with automatic fall‑back to Node.js when the native binary is unavailable. It talks to a Node.js daemon that owns a Playwright instance, giving you all the power of Playwright (Chromium, Firefox, WebKit) while keeping the CLI fast and deterministic.
Quick Start
# Install globally via npm
npm install -g agent-browser
# Grab the bundled Chromium
agent-browser install
# Open a site
agent-browser open https://example.com
# Take a reference‑based snapshot
agent-browser snapshot --json
# Click a button by ref
agent-browser click @e2
# Take a screenshot
agent-browser screenshot page.png
# Close it
agent-browser close
Installation Options
- Rust Native –
pnpm build:nativerequires Rust 1.75+. Gives the best startup time. - Node.js Fallback – Works everywhere but slightly slower.
- Linux Dependencies –
agent-browser install --with-depswill install Playwright’s system‑level dependencies automatically.
Features & Commands
| Category | Command | Description |
|---|---|---|
| Core | open <url> |
Navigate to a URL (aliases: goto, navigate). |
click <sel> |
Click an element. | |
fill <sel> <text> |
Clear and type into an input. | |
screenshot [path] |
Full‑page or viewport screenshot. | |
snapshot |
Output an accessibility tree with deterministic refs (@e1, @e2). |
|
wait <selector> |
Wait for visibility or a custom condition. | |
eval <js> |
Run arbitrary JavaScript. | |
close |
Terminate the browser instance. | |
| Semantic Locators | find role button click --name "Submit" |
Click by ARIA role and name. |
find text "Sign In" click |
Click by displayed text. | |
| Session Isolation | agent-browser --session chat1 open https://foo.com |
Create isolated sessions for parallel agents. |
| Headless / Headed | --headed |
Show the GUI for debugging. |
| Streaming | AGENT_BROWSER_STREAM_PORT=9223 agent-browser open https://example.com |
Opens a WebSocket that streams viewport frames (useful for “pair browsing”). |
| Auth & Headers | --headers '{"Authorization": "Bearer <token>"}' |
Set per‑origin HTTP headers for API access, skipping login UIs. |
| Debug & Logging | trace start, trace stop |
Record CDP traces for debugging. |
Architecture Snapshot
- Rust CLI – Parses commands, builds a JSON DSL, and sends them to the daemon.
- Node.js Daemon – Launches and manages the Playwright browser; keeps the process alive between invocations.
- Client‑Daemon Design – One daemon per system, so repeated commands don’t pay the cost of launching a new Chromium instance.
How AI Agents Use It
- Open a page.
- Snapshot with
--json. The result contains a structured tree (role,name,ref). - The AI reasoner parses the tree to find the target ref (e.g.,
@e3for an email box). - Act using CLI commands (
click @e3,fill @e4 "[email protected]"). - Re‑snapshot if the page changed, then repeat.
The CLI’s JSON output is tiny (~2‑3 KB) and ideal for LLMs with limited prompt tokens.
Use Cases
| Scenario | How agent-browser Helps |
|---|---|
| CI/CD Testing | Automate UI tests in CI jobs with zero‑configuration scripts. |
| Serverless Functions | Deploy a minimal Lambda that runs a CLI‑based browser for scraping or integration tasks. |
| Agent‑In‑The‑Loop | Pair an AI with a human by streaming the viewport over WebSocket. |
| Headless Analytics | Collect form data or product prices without GUI overhead. |
| API‑Bypass Flow | Pass auth headers directly to bypass login pages and hit protected APIs. |
Example: Building a Mini‑Scraper in 10 Lines
#!/usr/bin/env node
const { execSync } = require('child_process');
execSync('agent-browser install');
execSync('agent-browser open https://books.toscrape.com');
const data = execSync('agent-browser snapshot -i --json').toString();
const snapshot = JSON.parse(data).data.snapshot;
console.log('Page has books:', snapshot.children.filter(c=>c.role==='link').length);
execSync('agent-browser close');
Run it with node scrape.js and you’ll get a quick count of book links, all done via the CLI.
Performance & Reliability
- Native Rust binary boots in < 50 ms on macOS – compared to Node‑only counterparts that take 200–300 ms.
- The daemon remains resident after the first run, so subsequent commands are virtually instantaneous.
- Works on all major platforms; the binary is distributed via GitHub releases.
Getting Involved
- Contributions welcome on GitHub: create a fork, submit a PR, or open issues for bugs.
- Want more semantics? Add new
findactions or support more browser engines. - Documentation is living – check the
docs/folder and the CLI--helpoutput for the latest commands.
Final Thoughts
agent‑browser brings web automation to the fingertips of any AI system or developer with a single command. Its blend of Rust speed, Playwright versatility, and AI‑friendly reference snapshots makes it a compelling tool for testing, scraping, and building agent‑centric workflows. Whether you’re automating a CI pipeline or pairing a large‑language‑model with live browsing, agent‑browser gives you a lightweight, deterministic, and easy‑to‑program interface that feels at home in a shell or a serverless function.
Happy automating!