Domain skill
amazon
Markdown synced from browser-harness domain skills.
- Host
- amazon
- Files
- 1
Agent prompt
Use this skill
Copy this prompt into your coding agent to make it enable browser-harness domain skills and read this exact domain folder before automating.
Set up https://github.com/browser-use/browser-harness for me if it is not already installed. If setup is needed, read `install.md` first to install and connect it to my real browser. Then read `SKILL.md` for normal usage and always read `helpers.py` because that is where the browser-harness functions are. Enable domain skills if they are not already enabled by setting `BH_DOMAIN_SKILLS=1` for browser-harness. Use the `amazon` domain skill from `agent-workspace/domain-skills/amazon/`. Read every markdown file for this domain before inventing an approach: - agent-workspace/domain-skills/amazon/product-search.md Use those domain-skill notes to complete my task for `amazon` in my real browser. When you open a setup, verification, or task tab, activate it so I can see the active browser tab.
Skill contents
What the agent will read
Product Search & Data Extraction
product-search.md
- Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session. No CAPTCHA or bot detection was triggered during any test run.
- Always use newtab() when opening Amazon for the first time in a harness session. gotourl() can silently fail to navigate if the current tab resists the navigation (observed when the daemon attached to a different real...
- After that, gotourl() works fine within the same Amazon session.
- [data-component-type="s-search-result"] — confirmed working, yields 22 results per page.
Show full markdown
Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session. No CAPTCHA or bot detection was triggered during any test run.
Navigation
Direct search URL (fastest, always use this)
goto_url("https://www.amazon.com/s?k=mechanical+keyboard")
wait_for_load()
wait(2) # dynamic content needs ~2s after readyState=complete
Search box typing (use when you need category filtering)
goto_url("https://www.amazon.com")
wait_for_load()
wait(1)
js("document.querySelector('#twotabsearchtextbox').focus()")
js("document.querySelector('#twotabsearchtextbox').click()")
wait(0.3)
type_text("wireless mouse")
wait(0.3)
press_key("Enter")
wait_for_load()
wait(2)
Direct product page
# URL pattern: /dp/{ASIN} or /dp/{ASIN}?th=1 (Amazon may redirect to add ?th=1)
goto_url("https://www.amazon.com/dp/B08Z6X4NK3")
wait_for_load()
wait(2)
Session Gotcha
Always use new_tab() when opening Amazon for the first time in a harness session.
goto_url() can silently fail to navigate if the current tab resists the navigation
(observed when the daemon attached to a different real tab). The safe pattern:
tid = new_tab("https://www.amazon.com/s?k=mechanical+keyboard")
wait_for_load()
wait(2)
After that, goto_url() works fine within the same Amazon session.
Search Results Extraction
Container selector
[data-component-type="s-search-result"] — confirmed working, yields ~22 results per page.
Full extraction (field-tested)
results = js("""
Array.from(document.querySelectorAll('[data-component-type="s-search-result"]')).map(el => ({
asin: el.getAttribute('data-asin'),
title: el.querySelector('h2 span')?.innerText?.trim(),
price: el.querySelector('.a-price .a-offscreen')?.innerText,
list_price: el.querySelector('.a-text-price .a-offscreen')?.innerText,
rating: el.querySelector('[aria-label*="out of 5 stars"]')?.getAttribute('aria-label')?.split(' ')[0],
reviews: el.querySelector('[aria-label*="ratings"]')?.getAttribute('aria-label'),
is_sponsored: !!el.querySelector('.puis-sponsored-label-text'),
url: el.querySelector('h2 a')?.href
}))
""")
Field notes
asin:data-asinattribute on the container div — always present, matches the/dp/{ASIN}URL.title:h2 spanworks consistently.h2 a.a-link-normal spanalso works.price:.a-price .a-offscreenreturns the formatted string e.g."$69.99". Use this, not.a-price-whole.list_price:.a-text-price .a-offscreen— only present when item is on sale (was/now pricing).rating: Usearia-labelon[aria-label*="out of 5 stars"]— gives"4.5 out of 5 stars, rating details", split on space for the number.reviews: Use[aria-label*="ratings"]attribute — gives"1,514 ratings". Do NOT use.a-size-base.s-underline-text— that element exists on sponsored results and shows "Xbox" (a cross-sell widget text).is_sponsored:.puis-sponsored-label-textis present on sponsored listings; first 2-3 results are usually sponsored.url:h2 ahref — contains the full/dp/{ASIN}/...URL.
Product Detail Page Extraction
Confirmed selectors (field-tested on B08Z6X4NK3)
detail = js("""
({
title: document.querySelector('#productTitle')?.innerText?.trim(),
price: (function() {
var whole = document.querySelector('.a-price-whole')?.innerText?.replace(/[\\n.]/g,'');
var frac = document.querySelector('.a-price-fraction')?.innerText;
return (whole && frac) ? '$' + whole + '.' + frac
: document.querySelector('.a-price .a-offscreen')?.innerText || null;
})(),
list_price: document.querySelector('.basisPrice .a-offscreen')?.innerText,
rating: document.querySelector('#acrPopover')?.getAttribute('title'),
review_count: document.querySelector('#acrCustomerReviewText')?.innerText,
availability: document.querySelector('#availability span')?.innerText?.trim(),
brand: document.querySelector('#bylineInfo')?.innerText?.trim(),
asin: document.querySelector('input[name="ASIN"]')?.value,
bullet_points: Array.from(document.querySelectorAll('#feature-bullets li span.a-list-item'))
.map(e => e.innerText?.trim()).filter(t => t)
})
""")
Price field notes
#priceblock_ourpriceand#priceblock_dealpriceare legacy — they returnnullon modern product pages.- Construct price from
.a-price-whole+.a-price-fraction(both stripped of\nand.). - As a fallback: first
.a-price .a-offscreenon the page also works (confirmed$69.99). list_pricefrom.basisPrice .a-offscreenshows the crossed-out "was" price when a discount exists.
Best Sellers Page
URL: https://www.amazon.com/Best-Sellers-{Category}/zgbs/{slug}/
e.g. https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/
DOM structure (2025)
.zg-item-immersion does not exist — Amazon migrated to CSS modules. Use [data-asin] anchored on [id="gridItemRoot"]:
goto_url("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/")
wait_for_load()
wait(2)
items = js("""
Array.from(document.querySelectorAll('[data-asin]')).map(el => {
var container = el.closest('[id="gridItemRoot"]') || el;
return {
asin: el.getAttribute('data-asin'),
rank: container.querySelector('[class*="zg-bdg-text"]')?.innerText,
title: container.querySelector('img[alt]')?.getAttribute('alt'),
price: container.querySelector('.p13n-sc-price, .a-size-base.a-color-price')?.innerText,
url: 'https://www.amazon.com/dp/' + el.getAttribute('data-asin')
}
}).filter(r => r.rank)
""")
Note: Title comes from the product image alt attribute — the text title elements use obfuscated CSS module class names that change between deployments.
Pagination
# Get next page URL directly
next_url = js("document.querySelector('.s-pagination-next')?.href")
if next_url:
goto_url(next_url)
wait_for_load()
wait(2)
# Or construct by page number
goto_url("https://www.amazon.com/s?k=wireless+mouse&page=2")
Result Count
count_text = js("document.querySelector('[data-component-type=\"s-result-info-bar\"] h1')?.innerText?.trim()")
# Returns e.g.: '1-16 of over 40,000 results for "wireless mouse"\nSort by:\n...'
# Extract just the count: count_text.split('\n')[0]
CAPTCHA Detection
No CAPTCHA was encountered during testing with a logged-in Chrome session. To detect defensively:
def check_captcha():
text = js("document.body.innerText.slice(0,500)") or ""
url = page_info()["url"]
return (
"captcha" in text.lower()
or "enter the characters" in text.lower()
or "sorry, we just need to make sure" in text.lower()
or "captcha" in url.lower()
or "validateCaptcha" in url
)
if check_captcha():
raise RuntimeError("Amazon CAPTCHA hit — stop and notify user")
Amazon may serve a CAPTCHA on fresh/anonymous sessions. Using the browser's existing logged-in session avoids this in practice.
Gotchas
goto_url()silent failure: On first visit, usenew_tab(url)instead. After the tab is on Amazon,goto_url()works..zg-item-immersionis gone: Best Sellers page uses CSS module classes (obfuscated). Use[data-asin]+img[alt]for title..a-size-base.s-underline-textis unreliable for review count: On sponsored results it shows unrelated text (e.g. "Xbox"). Use[aria-label*="ratings"]instead.#priceblock_ourpriceis legacy: Returnsnullon modern pages. Construct from.a-price-whole+.a-price-fraction.- Sponsored results appear first: First 2-3 results are almost always
is_sponsored: true. Filter them out with!el.querySelector('.puis-sponsored-label-text')when you need organic results. data-asincan be empty string on non-product rows: Filter with.filter(r => r.asin).- Price split DOM:
.a-price-wholeinnerText includes a trailing\n.— strip it:.replace(/[\n.]/g,''). - ASIN from URL: Use
/dp/([A-Z0-9]{10})/regex on the product URL.data-asinon search results is always the canonical ASIN. ?th=1redirect: Amazon appends?th=1(and sometimes?psc=1) to product URLs after redirect. This is normal —input[name="ASIN"]always has the clean ASIN.- Wait 2s after
wait_for_load(): Amazon search results load the listing cards asynchronously.readyState=completefires before cards render. A hard 2s wait is required.