agent-browser: Fast Rust CLI for AI Browser Automation

agent-browser: Fast Rust CLI for AI Browser Automation

Why a New Tool for Browser Automation?

AI agents are increasingly required to interact with real‑world web pages: testing UI flows, scraping dynamic content, or crawling e‑commerce sites. Existing tools like Selenium or Playwright are fantastic libraries, but they demand a learning curve, setup overhead, and often a heavy runtime. If an AI system (Claude, GPT‑4, Gemini, etc.) were to control a browser, it would be convenient to expose that capability via a lightweight, command‑line interface that can be called from any language.

Enter agent-browser – a single executable that ships a fast Rust binary on macOS, Linux, and Windows, with automatic fall‑back to Node.js when the native binary is unavailable. It talks to a Node.js daemon that owns a Playwright instance, giving you all the power of Playwright (Chromium, Firefox, WebKit) while keeping the CLI fast and deterministic.

Quick Start

# Install globally via npm
npm install -g agent-browser
# Grab the bundled Chromium
agent-browser install
# Open a site
agent-browser open https://example.com
# Take a reference‑based snapshot
agent-browser snapshot --json
# Click a button by ref
agent-browser click @e2
# Take a screenshot
agent-browser screenshot page.png
# Close it
agent-browser close

Installation Options

  • Rust Nativepnpm build:native requires Rust 1.75+. Gives the best startup time.
  • Node.js Fallback – Works everywhere but slightly slower.
  • Linux Dependenciesagent-browser install --with-deps will install Playwright’s system‑level dependencies automatically.

Features & Commands

Category Command Description
Core open <url> Navigate to a URL (aliases: goto, navigate).
click <sel> Click an element.
fill <sel> <text> Clear and type into an input.
screenshot [path] Full‑page or viewport screenshot.
snapshot Output an accessibility tree with deterministic refs (@e1, @e2).
wait <selector> Wait for visibility or a custom condition.
eval <js> Run arbitrary JavaScript.
close Terminate the browser instance.
Semantic Locators find role button click --name "Submit" Click by ARIA role and name.
find text "Sign In" click Click by displayed text.
Session Isolation agent-browser --session chat1 open https://foo.com Create isolated sessions for parallel agents.
Headless / Headed --headed Show the GUI for debugging.
Streaming AGENT_BROWSER_STREAM_PORT=9223 agent-browser open https://example.com Opens a WebSocket that streams viewport frames (useful for “pair browsing”).
Auth & Headers --headers '{"Authorization": "Bearer <token>"}' Set per‑origin HTTP headers for API access, skipping login UIs.
Debug & Logging trace start, trace stop Record CDP traces for debugging.

Architecture Snapshot

  • Rust CLI – Parses commands, builds a JSON DSL, and sends them to the daemon.
  • Node.js Daemon – Launches and manages the Playwright browser; keeps the process alive between invocations.
  • Client‑Daemon Design – One daemon per system, so repeated commands don’t pay the cost of launching a new Chromium instance.

How AI Agents Use It

  1. Open a page.
  2. Snapshot with --json. The result contains a structured tree (role, name, ref).
  3. The AI reasoner parses the tree to find the target ref (e.g., @e3 for an email box).
  4. Act using CLI commands (click @e3, fill @e4 "[email protected]").
  5. Re‑snapshot if the page changed, then repeat.

The CLI’s JSON output is tiny (~2‑3 KB) and ideal for LLMs with limited prompt tokens.

Use Cases

Scenario How agent-browser Helps
CI/CD Testing Automate UI tests in CI jobs with zero‑configuration scripts.
Serverless Functions Deploy a minimal Lambda that runs a CLI‑based browser for scraping or integration tasks.
Agent‑In‑The‑Loop Pair an AI with a human by streaming the viewport over WebSocket.
Headless Analytics Collect form data or product prices without GUI overhead.
API‑Bypass Flow Pass auth headers directly to bypass login pages and hit protected APIs.

Example: Building a Mini‑Scraper in 10 Lines

#!/usr/bin/env node
const { execSync } = require('child_process');

execSync('agent-browser install');
execSync('agent-browser open https://books.toscrape.com');
const data = execSync('agent-browser snapshot -i --json').toString();
const snapshot = JSON.parse(data).data.snapshot;
console.log('Page has books:', snapshot.children.filter(c=>c.role==='link').length);
execSync('agent-browser close');

Run it with node scrape.js and you’ll get a quick count of book links, all done via the CLI.

Performance & Reliability

  • Native Rust binary boots in < 50 ms on macOS – compared to Node‑only counterparts that take 200–300 ms.
  • The daemon remains resident after the first run, so subsequent commands are virtually instantaneous.
  • Works on all major platforms; the binary is distributed via GitHub releases.

Getting Involved

  • Contributions welcome on GitHub: create a fork, submit a PR, or open issues for bugs.
  • Want more semantics? Add new find actions or support more browser engines.
  • Documentation is living – check the docs/ folder and the CLI --help output for the latest commands.

Final Thoughts

agent‑browser brings web automation to the fingertips of any AI system or developer with a single command. Its blend of Rust speed, Playwright versatility, and AI‑friendly reference snapshots makes it a compelling tool for testing, scraping, and building agent‑centric workflows. Whether you’re automating a CI pipeline or pairing a large‑language‑model with live browsing, agent‑browser gives you a lightweight, deterministic, and easy‑to‑program interface that feels at home in a shell or a serverless function.

Happy automating!

Original Article: View Original

Share this article