Skip to main content
Agents perform best when they treat Browser MCP as a set of high-level browser skills, not as raw browser automation. Use guidance like this in your MCP client or agent harness:
You are a browsing agent using StableBrowse MCP.

Start with create_session, then navigate.
For reading pages, prefer content.get_markdown or extract before snapshot.
Use snapshot when you need refs for visible controls.
Use click/fill/fill_form for ref-based actions.
Use interact.find when you know the target text/role but do not have a ref.
Use extract.cards for products, listings, releases, and search results.
Use extract.form_fields before filling forms.
Use extract.choice_groups and interact.choose for filters and configurators.
Use knowledge.lookup on known sites before broad exploration.
Use knowledge.amazon_products for Amazon product/deal/ranking tasks.
Use screenshots only for visual verification or content that is not represented as text.
Use evaluate only when typed tools cannot answer.
When the task is satisfied, answer and stop calling tools.

Decision tree

Do you need to read content?

Use:
  • content.get_markdown for docs, articles, blogs, and readable pages
  • extract.section for a known heading
  • extract.search_page for a phrase
  • extract.cards for listings/results/products/releases
  • extract.table for tables
  • content.read_pdf for PDFs
Avoid starting with snapshot unless you need clickable refs.

Do you need to act on controls?

Use:
  • snapshot to discover refs
  • click for one ref
  • fill for one field
  • fill_form for several fields
  • interact.find when target text/role is known but refs are not
  • interact.choose for multiple visible choices
After actions that navigate or update the page, call history.wait_for.

Do you need to debug?

Use:
  • network.console_logs for JavaScript errors
  • network.list and network.get for API traffic
  • storage for cookies/localStorage/sessionStorage
  • screenshot for visual verification
  • session.stealth_status for fingerprint or detection checks

Common failure modes

Failure modeBetter behavior
Agent loops over huge snapshotsUse extract.page_summary, extract.cards, extract.section, or content.get_markdown.
Agent fills one field per turnUse fill_form or interact.fill_form.
Agent clicks stale refsTake a fresh snapshot after DOM changes.
Agent guesses selectorsUse snapshot, extract.find, interact.find, or knowledge.lookup.
Agent uses screenshots for textUse content or extract first.
Agent waits blindlyUse history.wait_for with text, selector, URL, or load state.
Agent writes JavaScript too earlyUse typed extraction tools first; reserve evaluate for gaps.

Good prompts for testing

These are useful smoke tests for Browser MCP:
Open the Playwright npm package page and report the current version, weekly downloads, license, repository, and install command.
Open the Python asyncio task docs and explain TaskGroup with two behavior notes from the official page.
Open Hacker News and return the top 3 visible stories with titles, points, and comment counts.
Open the GitHub releases page for microsoft/playwright and report the latest release tag plus two highlighted changes.
Open the Stripe idempotent requests docs and summarize how idempotency keys work, including key length guidance and pruning.
Search arXiv for "browser automation agents" and return the title and authors of the first result.

Evaluating a run

Check more than the final answer. A good run should have:
  • few turns
  • low tool error count
  • no repeated broad screenshots
  • no unnecessary custom JavaScript
  • targeted extraction before broad snapshots
  • successful waits after navigation or submission
  • a final answer only after the requested data is collected
For benchmark logs, track:
MetricWhy it matters
TurnsMeasures agent planning efficiency.
Wall timeMeasures user-visible latency.
Input tokensShows whether tools are returning too much data.
Tool errorsReveals bad tool routing or unclear schemas.
Final answer correctnessConfirms the task was actually completed.