Domain skill
Markdown synced from browser-harness domain skills.
- Host
- Files
- 1
Agent prompt
Use this skill
Copy this prompt into your coding agent to make it enable browser-harness domain skills and read this exact domain folder before automating.
Set up https://github.com/browser-use/browser-harness for me if it is not already installed. If setup is needed, read `install.md` first to install and connect it to my real browser. Then read `SKILL.md` for normal usage and always read `helpers.py` because that is where the browser-harness functions are. Enable domain skills if they are not already enabled by setting `BH_DOMAIN_SKILLS=1` for browser-harness. Use the `reddit` domain skill from `agent-workspace/domain-skills/reddit/`. Read every markdown file for this domain before inventing an approach: - agent-workspace/domain-skills/reddit/scraping.md Use those domain-skill notes to complete my task for `reddit` in my real browser. When you open a setup, verification, or task tab, activate it so I can see the active browser tab.
Skill contents
What the agent will read
Scraping & Post Extraction
scraping.md
- Reddit's "new" web UI (reddit.com) is a Lit / web-components SPA built around custom elements (shreddit-post, shreddit-comment, faceplate-). This makes DOM extraction unusually reliable — the custom element tags are...
- Use the browser when you're logged in (private subreddits, NSFW gates, rate-limit avoidance). For fully public content, the JSON API path below is faster.
- Full post: https://www.reddit.com/r/<sub>/comments/<id>/<slug>/
- Share short-link: https://www.reddit.com/r/<sub>/s/<hash> — redirects to the full URL once the page loads. newtab(shorturl) + waitforload() is enough; by the time you read location.href it will be the canonical one.
Show full markdown
Reddit's "new" web UI (reddit.com) is a Lit / web-components SPA built around custom elements (shreddit-post, shreddit-comment, faceplate-*). This makes DOM extraction unusually reliable — the custom element tags are stable and exposed on the element itself (no hashed class names).
Use the browser when you're logged in (private subreddits, NSFW gates, rate-limit avoidance). For fully public content, the JSON API path below is faster.
URL patterns
- Full post:
https://www.reddit.com/r/<sub>/comments/<id>/<slug>/ - Share short-link:
https://www.reddit.com/r/<sub>/s/<hash>— redirects to the full URL once the page loads.new_tab(short_url)+wait_for_load()is enough; by the time you readlocation.hrefit will be the canonical one. - Old Reddit: append
/.jsonto any post URL for anonymous JSON:https://www.reddit.com/r/<sub>/comments/<id>/.json. - Old UI (simpler DOM, no web components):
https://old.reddit.com/r/<sub>/comments/<id>/— useful fallback whenshreddit-*selectors change.
Path 1: JSON API (fastest for public posts)
from helpers import http_get
import json
url = "https://www.reddit.com/r/cursor/comments/1l0u9y7/claude_code_prompt_to_autogenerate_full_cursor/.json"
data = json.loads(http_get(url, headers={"User-Agent": "Mozilla/5.0"}))
post = data[0]["data"]["children"][0]["data"]
# post fields: title, selftext, author, score, num_comments, created_utc, url, permalink
comments = data[1]["data"]["children"] # list of { kind: "t1", data: {...} } or { kind: "more" }
Fails on:
- Private / quarantined subreddits (401)
- NSFW posts without an authenticated session
- Anti-scraping 429s under load — back off or switch to the browser path
Path 2: Browser DOM extraction (logged-in)
Core selector: every post renders inside a single <shreddit-post> custom element. Top-level comments are <shreddit-comment depth="0">.
browser-harness <<'PY'
new_tab("https://www.reddit.com/r/vibecoding/comments/1kwuqpz/")
wait_for_load()
wait(3.0) # SPA still hydrating after readyState=complete
# Scroll to force comment tree lazy-load (twice, ~2000px each)
scroll(500, 500, dy=2000); wait(1.0)
scroll(500, 500, dy=2000); wait(1.0)
data = js(r"""
(()=>{
const postEl = document.querySelector('shreddit-post');
if(!postEl) return null;
const title = (postEl.querySelector('h1, [slot="title"]')||{}).innerText?.trim() || '';
const bodyEl = postEl.querySelector('[slot="text-body"] .md, [slot="text-body"]');
const body = bodyEl ? bodyEl.innerText.trim() : '';
const author = (postEl.querySelector('[slot="authorName"] a, a[data-testid="post_author_link"]')||{}).innerText?.trim() || '';
const subM = location.pathname.match(/^\/r\/([^\/]+)/);
const subreddit = subM ? subM[1] : '';
const scoreEl = postEl.querySelector('faceplate-number');
const score = scoreEl ? scoreEl.getAttribute('number') || scoreEl.innerText : '';
const comments = [];
for(const c of document.querySelectorAll('shreddit-comment[depth="0"]')){
const cBodyEl = c.querySelector('[slot="comment"] .md, [slot="comment"]');
const cBody = cBodyEl ? cBodyEl.innerText.trim() : '';
if(!cBody) continue;
comments.push({
author: c.getAttribute('author') || '',
score: c.getAttribute('score') || '',
body: cBody
});
if(comments.length >= 10) break;
}
return {subreddit, title, author, score, body, comments, url: location.href};
})()
""")
print(data["title"], "·", len(data["body"]), "chars ·", len(data["comments"]), "comments")
PY
Key selectors
| Target | Selector | Notes |
|---|---|---|
| Post container | shreddit-post | One per post page. Attributes include post-title, post-id, subreddit-name, author. |
| Post title | shreddit-post h1 or [slot="title"] | H1 is also the page title. |
| Post text body | shreddit-post [slot="text-body"] .md | .md is the rendered markdown container. For link posts this selector returns null (there is no body). |
| Post author name | [slot="authorName"] a | Plain text. |
| Vote score | shreddit-post faceplate-number | Read the number attribute (digit string) — innerText is abbreviated ("1.2k"). |
| Top-level comment | shreddit-comment[depth="0"] | Depth is an attribute — depth="1" is a reply, etc. |
| Comment body | shreddit-comment [slot="comment"] .md | Same pattern as post body. |
| Comment author / score | shreddit-comment attributes: author, score, created-timestamp | Use getAttribute, not DOM descendants. |
Share links
/s/<hash> URLs redirect before the SPA mounts. You don't need to resolve them manually — just new_tab(url) + wait_for_load() + wait(2), then read location.href for the canonical path.
Comment tree lazy-loading
New Reddit renders only the initial visible comments. To get more, scroll twice. ensureReplies / more placeholders exist but clicking them is brittle; scroll is the most reliable trigger. For a deep thread, loop scroll + wait until shreddit-comment count stabilizes between passes.
Login / gate detection
state = js("""
(()=>{
const loginWall = !!document.querySelector('a[href*="/login"], [data-testid="login-button"]');
const ageGate = !!document.querySelector('[data-testid="nsfw-gate"], shreddit-interstitial');
return {loginWall, ageGate};
})()
""")
If ageGate is true and you are logged in but haven't opted into NSFW content, the gate blocks extraction — toggle NSFW in account settings, not programmatically.
Gotchas
faceplate-number.innerTextis abbreviated ("1.2k", "16.6k"). Always prefergetAttribute('number')for the exact digit count.shreddit-commentis a custom element, not a<div>. CSS descendant selectors still work, but older jQuery-style parent traversals may not — stick to standard DOM.depth="0"is a string attribute.[depth="0"]in a CSS selector works;depth=0(no quotes) also works in the newer parser, but the quoted form is safest.- Collapsed comments render with body still in the DOM, but behind
expando-button. The.mdselector still grabs the text — you don't need to expand. - Post body can be empty. For link posts or image posts,
[slot="text-body"]doesn't exist; null-check before reading.innerText. wait_for_load()is not enough. Reddit sometimes paints the post skeleton before the content hydrates. Addwait(2.0)–wait(3.0)afterwait_for_load()or retry reads on nullshreddit-post.- Share URLs (
/s/<hash>) can't be deep-linked into a comment. They always land at the post top. If the original raindrop captured/s/..., the in-DOM permalink (read fromlocation.hrefafter load) is the canonical URL worth storing. - Old Reddit (
old.reddit.com) is a separate DOM — noshreddit-*elements, nofaceplate-*. If your login session was established on new Reddit,old.reddit.comwill still honor the cookie. - For NSFW or quarantined subs, the browser path requires your account to have opted in. The JSON API requires OAuth with appropriate scope.
[slot="text-body"] .md .md— Reddit occasionally double-wraps; the selector[slot="text-body"] .mdis the outer one and is what you want. Using[slot="text-body"]alone works too, but may include meta text.