Recommended system guidance
Use guidance like this in your MCP client or agent harness:Decision tree
Do you need to read content?
Use:content.get_markdownfor docs, articles, blogs, and readable pagesextract.sectionfor a known headingextract.search_pagefor a phraseextract.cardsfor listings/results/products/releasesextract.tablefor tablescontent.read_pdffor PDFs
snapshot unless you need clickable refs.
Do you need to act on controls?
Use:snapshotto discover refsclickfor one reffillfor one fieldfill_formfor several fieldsinteract.findwhen target text/role is known but refs are notinteract.choosefor multiple visible choices
history.wait_for.
Do you need to debug?
Use:network.console_logsfor JavaScript errorsnetwork.listandnetwork.getfor API trafficstoragefor cookies/localStorage/sessionStoragescreenshotfor visual verificationsession.stealth_statusfor fingerprint or detection checks
Common failure modes
| Failure mode | Better behavior |
|---|---|
| Agent loops over huge snapshots | Use extract.page_summary, extract.cards, extract.section, or content.get_markdown. |
| Agent fills one field per turn | Use fill_form or interact.fill_form. |
| Agent clicks stale refs | Take a fresh snapshot after DOM changes. |
| Agent guesses selectors | Use snapshot, extract.find, interact.find, or knowledge.lookup. |
| Agent uses screenshots for text | Use content or extract first. |
| Agent waits blindly | Use history.wait_for with text, selector, URL, or load state. |
| Agent writes JavaScript too early | Use typed extraction tools first; reserve evaluate for gaps. |
Good prompts for testing
These are useful smoke tests for Browser MCP:Evaluating a run
Check more than the final answer. A good run should have:- few turns
- low tool error count
- no repeated broad screenshots
- no unnecessary custom JavaScript
- targeted extraction before broad snapshots
- successful waits after navigation or submission
- a final answer only after the requested data is collected
| Metric | Why it matters |
|---|---|
| Turns | Measures agent planning efficiency. |
| Wall time | Measures user-visible latency. |
| Input tokens | Shows whether tools are returning too much data. |
| Tool errors | Reveals bad tool routing or unclear schemas. |
| Final answer correctness | Confirms the task was actually completed. |
