Domain skill
xiaohongshu
Markdown synced from browser-harness domain skills.
- Host
- xiaohongshu
- Files
- 1
Agent prompt
Use this skill
Copy this prompt into your coding agent to make it enable browser-harness domain skills and read this exact domain folder before automating.
Set up https://github.com/browser-use/browser-harness for me if it is not already installed. If setup is needed, read `install.md` first to install and connect it to my real browser. Then read `SKILL.md` for normal usage and always read `helpers.py` because that is where the browser-harness functions are. Enable domain skills if they are not already enabled by setting `BH_DOMAIN_SKILLS=1` for browser-harness. Use the `xiaohongshu` domain skill from `agent-workspace/domain-skills/xiaohongshu/`. Read every markdown file for this domain before inventing an approach: - agent-workspace/domain-skills/xiaohongshu/scraping.md Use those domain-skill notes to complete my task for `xiaohongshu` in my real browser. When you open a setup, verification, or task tab, activate it so I can see the active browser tab.
Skill contents
What the agent will read
Search and Sort
scraping.md
- URL patterns:
- Home / discovery: https://www.xiaohongshu.com/explore
- Search results: https://www.xiaohongshu.com/searchresult?keyword=...
- Prefer direct navigation to the desktop search results page over automating the home-page search box.
Show full markdown
URL patterns:
- Home / discovery:
https://www.xiaohongshu.com/explore - Search results:
https://www.xiaohongshu.com/search_result?keyword=...
Search flow
- Prefer direct navigation to the desktop search results page over automating the home-page search box.
- Reliable primary path:
https://www.xiaohongshu.com/search_result?keyword=<url-encoded keyword>&source=web_explore_feed - This route loads the normal desktop results page and avoids home-page input flakiness.
- The search results page can also appear with variants such as
type=51or othersourcevalues after in-app navigation; do not treat those as suspicious if the rendered results are correct. - The top search box on
explorecan work, and searching from the home page has transitioned tosearch_resultwithout a login wall in some sessions. - The page exposes duplicate search inputs in the DOM with the same placeholder
搜索小红书. - The home-page search input can behave like a tightly controlled app field: direct DOM value assignment may be cleared immediately, and harness
type_text()may fail to populate it even when the input is focused. - Treat the home-page input as best-effort only. Use it when a human-like interactive flow matters, but for automation default to constructing the
search_resultURL directly.
Sort behavior
- On the current desktop results layout,
最新is not a top-level tab beside综合. - Open the
筛选control in the upper-right of the results header to access sort options. - Inside
筛选,排序依据contains:综合最新最多点赞最多评论最多收藏
- The
排序依据row can render duplicate DOM nodes for the same pill text, including non-interactive clones. - Raw global text search for
最新can hit the wrong node first. Scope to the排序依据section and then choose the visible interactive.tagsnode. - Prefer semantic filtering such as
aria-hidden != "true"or section-scoped visible.tagsselection over style-specific checks. - When
最新is active, the筛选trigger changes to已筛选. - The rendered feed and the
已筛选/ active-pill UI are more reliable thanwindow.__INITIAL_STATE__.search.searchContext.sortfor confirming latest sort.
Stable cues
- Search channel tabs near the top:
全部,图文,视频,用户 - Sort panel labels:
筛选,排序依据,最新 - Filter sections also visible in the panel:
笔记类型,发布时间,搜索范围,位置距离
Interaction notes
- DOM
.click()opened the筛选panel reliably. - DOM
.click()on the visible最新pill inside the open排序依据section reliably activated latest sort. - The reliable DOM pattern was:
- find the
排序依据section /.filtersblock - search within that block for
.tags - choose the one whose text is
最新and which is the visible interactive node - call
.click()on that visible node
- find the
- Example selector strategy:
- find
.filterswhose first label is排序依据 - inside it, pick
.tagswheretextContent.trim() === "最新"andel.getAttribute("aria-hidden") !== "true"
- find
getClientRects().length > 0alone may be insufficient to distinguish the working node from a duplicate.- A broad
document.querySelectorAll("*")text match for最新is not reliable on this page because it may click the hidden duplicate instead of the visible control. - Coordinate click on the visible
最新pill also worked and remains a valid fallback if DOM targeting gets confused by future UI changes. - After selecting
最新, the grid briefly showed skeleton placeholders before the refreshed results appeared. - The search page stores the currently rendered note cards in
window.__INITIAL_STATE__.search.feeds._valueas an array of feed entries. For ordinary note cards, the useful fields were:idxsecTokennoteCard.displayTitlenoteCard.user.nickname
- The feed array can contain non-note inserts such as hot-query modules. Filter for entries with
noteCardbefore treating an item as a note result.
Post opening
- Do not assume a raw results link like
https://www.xiaohongshu.com/explore/<id>is directly openable. - Opening that raw
/explore/<id>URL in a fresh tab can redirect to the web404/ app-only gate even when the same post is openable from search results. - To open a post from search results, click the visible card image / card in-page first.
- That click navigation can land on a tokenized URL like
https://www.xiaohongshu.com/explore/<id>?xsec_token=...&xsec_source=pc_search, which is a more reliable note URL than the raw/explore/<id>form. - Once the tokenized URL is obtained from the click flow, it can be revisited in-session for extraction.
- If the search results state is already loaded, you can reconstruct the tokenized note URL directly from a feed item without re-clicking:
https://www.xiaohongshu.com/explore/<id>?xsec_token=<xsecToken>&xsec_source=pc_search
Post extraction
- On tokenized post pages opened via
pc_search,document.body.innerTextcan be a useful first-pass extraction source because it often includes the rendered note text, hashtags, timestamp, engagement counts, and visible comments. - Verify that the note content actually rendered before trusting
document.body.innerText, because the page can also include substantial navigation, footer, and comment noise. - Prefer
document.body.innerTextas a fallback or initial probe before writing fragile per-element selectors for post content.
Gotchas
- Do not assume
Enteralone finished the workflow until you verify the URL changed tosearch_resultor the result grid appeared. - Do not assume the visible
综合tab controls all sorting; on this layout, time ordering is hidden inside筛选. - Do not assume the first DOM node whose text is
最新is the clickable one; this panel duplicates pills and the hidden clone can absorb naive text-based targeting without changing state. - Do not assume a successfully opened post can be reproduced by stripping query params; preserve the
xsec_tokenwhen reopening results-derived post URLs.