Introduction
AI agents are increasingly being used for tasks that require web interaction - from research to data collection to form filling. agent-browser is a fast CLI tool specifically designed for this purpose. It provides a clean interface for AI agents to interact with websites using an accessibility tree with deterministic element references.
Installation
npm install -g agent-browser
agent-browser install # Install browser binariesCore Concepts
The key innovation is the snapshot with refs - instead of dealing with complex CSS selectors or XPath, agent-browser assigns unique references (@e1, @e2, etc.) to interactive elements:
$ agent-browser open "https://news.ycombinator.com"
✓ Hacker News
https://news.ycombinator.com/
$ agent-browser snapshot -i -c
- link "Hacker News" [ref=e2]
- link "new" [ref=e3]
- link "past" [ref=e4]
- link "comments" [ref=e5]
- link "The URL shortener that makes your links look suspicious" [ref=e12]
- link "35 comments" [ref=e17]Now clicking is trivial: agent-browser click @e12
Real-World Examples
Example 1: Browsing Wikipedia
Let’s explore the IIT Gandhinagar Wikipedia page:
$ agent-browser open "https://en.wikipedia.org/wiki/IIT_Gandhinagar"
✓ IIT Gandhinagar - Wikipedia
https://en.wikipedia.org/wiki/IIT_Gandhinagar
$ agent-browser snapshot -i -c | head -30
- link "Jump to content" [ref=e1]
- button "Main menu" [ref=e2]
- searchbox "Search Wikipedia" [ref=e4]
- link "(Top)" [ref=e10]
- link "History" [ref=e11]
- link "Foundation" [ref=e13]
- link "Academics" [ref=e14]
- link "Departments" [ref=e16]
- link "Centres" [ref=e17]
- link "Rankings" [ref=e22]
- link "Campus" [ref=e23]You can take a full-page screenshot:
$ agent-browser screenshot wiki.png --full
✓ Screenshot saved to wiki.pngExample 2: Searching ArXiv for Research Papers
Search for papers on energy disaggregation (a research area in my lab):
$ agent-browser open "https://arxiv.org/search/?query=energy+disaggregation&searchtype=all"
✓ Search | arXiv e-print repository
https://arxiv.org/search/?query=energy+disaggregation&searchtype=all
$ agent-browser snapshot -c -d 3 | head -40
- textbox "Search term or terms" [ref=e6]
- combobox "Field to search" [ref=e9]
- button "Search" [ref=e25]
- main:
- 'heading "Showing 1–50 of 208 results for all: energy disaggregation" [level=1]'
- textbox "Search term or terms" [ref=e29]: energy disaggregation
- paragraph: Revisiting Disaggregated Large Language Model Serving...
- paragraph: Submitted 14 November, 2025
- paragraph: Real Time NILM Based Power Monitoring of Identical Motors...
- paragraph: Submitted 4 January, 2026Example 3: Extracting Text Content
Get specific text from elements:
$ agent-browser open "https://news.ycombinator.com"
$ agent-browser get text @e12
The URL shortener that makes your links look as suspicious as possibleExample 4: Form Interaction
Fill and submit forms:
# Open a search page
$ agent-browser open "https://arxiv.org"
$ agent-browser snapshot -i -c | grep textbox
- textbox "Search term or terms" [ref=e6]
# Fill the search box
$ agent-browser fill @e6 "transformer attention mechanism"
# Submit
$ agent-browser press EnterPython Integration
You can easily integrate agent-browser with Python scripts:
import subprocess
import json
def run_agent_browser(command):
"""Run agent-browser command and return output."""
result = subprocess.run(
f"agent-browser {command}",
shell=True,
capture_output=True,
text=True
)
return result.stdout.strip()
# Example: Extract Hacker News headlines
run_agent_browser('open "https://news.ycombinator.com"')
snapshot = run_agent_browser('snapshot -i -c --json')
# Parse and extract headlines
data = json.loads(snapshot)
headlines = [
elem for elem in data
if 'link' in str(elem) and 'comments' not in str(elem)
]Use Cases for Research
1. Literature Review Automation
Automatically search and collect paper metadata from ArXiv, Google Scholar, or Semantic Scholar.
2. Data Collection
Scrape public datasets, weather data, or government statistics for research projects.
3. Conference/Journal Monitoring
Set up automated checks for conference deadlines, paper acceptance notifications, or journal updates.
4. Teaching Material Collection
Gather examples, diagrams, or references from educational websites for course preparation.
Key Commands Reference
| Command | Description |
|---|---|
open <url> |
Navigate to URL |
snapshot -i -c |
Get interactive elements in compact format |
click @ref |
Click element by reference |
fill @ref "text" |
Fill input field |
get text @ref |
Extract text from element |
screenshot [path] |
Take screenshot |
close |
Close browser |
Tips for AI Agent Integration
- Use
snapshot -ifor interactive elements only - reduces noise - Use
snapshot -cfor compact output - removes empty structural elements - Use
--jsonflag for programmatic parsing - Session isolation: Use
--session <name>for parallel browser instances
Conclusion
agent-browser provides a clean, AI-friendly interface for web automation. The reference-based element selection (@e1, @e2) makes it ideal for LLM-powered agents that need to interact with websites programmatically.
Links: - agent-browser website - GitHub repository