Domain skill

facebook

Markdown synced from browser-harness domain skills.

Host: facebook
Files: 2

Agent prompt

Use this skill

Copy this prompt into your coding agent to make it enable browser-harness domain skills and read this exact domain folder before automating.

Set up https://github.com/browser-use/browser-harness for me if it is not already installed. If setup is needed, read `install.md` first to install and connect it to my real browser. Then read `SKILL.md` for normal usage and always read `helpers.py` because that is where the browser-harness functions are.

Enable domain skills if they are not already enabled by setting `BH_DOMAIN_SKILLS=1` for browser-harness. Use the `facebook` domain skill from `agent-workspace/domain-skills/facebook/`. Read every markdown file for this domain before inventing an approach:
- agent-workspace/domain-skills/facebook/groups.md
- agent-workspace/domain-skills/facebook/pages.md

Use those domain-skill notes to complete my task for `facebook` in my real browser. When you open a setup, verification, or task tab, activate it so I can see the active browser tab.

Skill contents

What the agent will read

mining feeds for posts + external URLs

groups.md

Source

Field-tested against a logged-in Jay account on 2026-04-18. Requires: Browser Harness driving a real Chrome that is (a) signed into Facebook and (b) already a member of the target group. Non-member or logged-out views...
Pull the N most recent posts from a named FB group
Harvest every external URL that members have shared
Hand that URL list to httpget or another downstream extractor for structured scraping at scale

Show full markdown

Field-tested against a logged-in Jay account on 2026-04-18. Requires: Browser Harness driving a real Chrome that is (a) signed into Facebook and (b) already a member of the target group. Non-member or logged-out views serve a stripped landing page with no post content.

What this skill is for

Pull the N most recent posts from a named FB group
Harvest every external URL that members have shared
Hand that URL list to http_get or another downstream extractor for structured scraping at scale
Cache post text + author + timestamp for downstream keyword matching

It is NOT for: replying in groups, DMing members, or any write action.

URL patterns

What	URL
Group main feed	`https://www.facebook.com/groups/{id_or_slug}`
Group "Discussion" tab (canonical feed)	`https://www.facebook.com/groups/{id_or_slug}/?sorting_setting=CHRONOLOGICAL`
Single post (permalink)	`https://www.facebook.com/groups/{id_or_slug}/posts/{post_id}/`
User's joined-groups feed	`https://www.facebook.com/groups/feed/`
List of YOUR groups	`https://www.facebook.com/groups/joins/`

The ?sorting_setting=CHRONOLOGICAL flag matters — without it, FB inserts an algorithmic ranking that hides older posts and shows the same handful of "popular" items every visit, which kills monitoring use cases.

DOM anchors (verified 2026-04-18)

FB rewrites class names every few weeks but ARIA roles and stable URL patterns hold up well. Anchor on those, not on hashed CSS classes.

Anchor	Selector	Notes
Each post container	`div[role="article"]`	Stable. One per visible post.
Post permalink	`a[href="/groups/"][href="/posts/"], a[href="/groups/"][href="/permalink/"]`	First match per article = the post link
Post body text	`div[data-ad-preview="message"], div[data-ad-comet-preview="message"]`	One of these is the visible body
Post author	`h3 a, h4 a` (first inside the article)	Falls back to `strong a`
Post timestamp	`a[href*="/posts/"] abbr, a[role="link"] > span > span` (relative time text)	Hover gets the absolute time but the relative string is fine for sorting
External link (FB redirector)	`a[href^="https://l.facebook.com/l.php?u="]`	Decode the `u=` param to get the real URL
"See more" button on long posts	`div[role="button"]:has(span:contains("See more"))` (use XPath fallback if `:has` is unsupported)	Click before reading body or posts get truncated

If selectors stop returning results, run the self-inspection block at the bottom of this file and update this table — that's the workflow, not a fallback.

Scrolling the feed (lazy load)

FB virtualizes the feed: scrolled-past posts get unmounted from the DOM. So "scroll then collect" misses old posts. Pattern that works: collect-as-you-go.

python

seen = {}  # post_url -> dict
TARGET = 50  # how many posts to collect
MAX_SCROLLS = 30

for i in range(MAX_SCROLLS):
    new_posts = js("""
      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
        const link = el.querySelector('a[href*="/groups/"][href*="/posts/"], a[href*="/groups/"][href*="/permalink/"]');
        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
        const author = el.querySelector('h3 a, h4 a, strong a');
        const time = el.querySelector('abbr, a[role="link"] > span > span');
        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]'))
          .map(a => a.href);
        return {
          url: link?.href || null,
          author: author?.innerText || null,
          time: time?.innerText || null,
          body: body?.innerText?.slice(0, 4000) || null,
          externals: externals,
        };
      }).filter(p => p.url)
    """) or []
    for p in new_posts:
        seen.setdefault(p["url"], p)
    if len(seen) >= TARGET:
        break
    scroll(640, 400, dy=900)  # scroll near middle of viewport
    wait(2.5)  # FB needs ~2s to render new batch + a little buffer

wait(2.5) is the floor. Faster than that and you'll see empty post containers because React hasn't hydrated them yet.

Decoding the external-URL redirector

Every external link gets wrapped in https://l.facebook.com/l.php?u={URL-encoded real URL}&h=.... You want the real URL, not the redirector.

python

from urllib.parse import urlparse, parse_qs, unquote
def decode_fb_link(href):
    if not href.startswith("https://l.facebook.com/l.php"):
        return href
    q = parse_qs(urlparse(href).query)
    return unquote(q["u"][0]) if "u" in q else href

Handoff for the public outbound URLs

Once you have the harvested external list, those URLs are outside FB's walled garden — public, scrapable by ordinary HTTP clients or downstream extractors. Typed extraction is useful here because the sources are heterogeneous.

python

# After the scroll loop:
external_urls = []
for p in seen.values():
    for raw in p["externals"]:
        external_urls.append(decode_fb_link(raw))
external_urls = sorted(set(external_urls))
print(f"harvested {len(external_urls)} unique external URLs")

# Hand off to a downstream extractor in the calling conversation with whatever
# schema matches the task, such as product/listing name, price, location, year,
# and key features.

For simple or static pages, http_get(url) from Harness itself is fine — it does a plain HTTP fetch without a browser and is the fastest option for bulk.

Rate-limit discipline

FB notices automation patterns at the account level, not the IP level. Driving a real logged-in session means Jay's account is the one getting rate-limited if you get greedy. Keep these floors:

≥2 seconds between scrolls in the collect loop (the wait(2.5) above)
≥3 seconds between groups if you're sweeping multiple
No more than ~6 groups per hour for sustained monitoring
Don't open the same group more than every 15 minutes — repeated visits within a short window is a heuristic that triggers checkpoints

Symptoms of over-pacing: article containers start rendering with empty bodies, /groups/{id}/ redirects to /checkpoint/, or the account briefly gets asked to re-verify a phone or confirm a login from a new device. If that happens, stop immediately and let Jay deal with the UI — don't try to auto-resolve.

Self-inspection block (run this when selectors stop working)

Paste this into a Harness stdin block to see what anchors currently exist in the visible feed. Run it on a group you're a member of.

python

print(js("""
  ({
    articles: document.querySelectorAll('div[role="article"]').length,
    body_preview_a: document.querySelectorAll('div[data-ad-preview="message"]').length,
    body_preview_b: document.querySelectorAll('div[data-ad-comet-preview="message"]').length,
    external_redirectors: document.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]').length,
    permalink_posts: document.querySelectorAll('a[href*="/groups/"][href*="/posts/"]').length,
    permalink_permalinks: document.querySelectorAll('a[href*="/groups/"][href*="/permalink/"]').length,
  })
"""))
# If any count is 0, the selector drifted. Open DevTools, right-click a visible
# post, inspect, find the new stable attribute (aria-*, data-*), and update the
# DOM anchors table above.

Full example — mine one group, emit JSON for downstream tools

bash

cd ~/Developer/browser-harness && uv run browser-harness <<'PY'
import json, sys
from urllib.parse import urlparse, parse_qs, unquote

GROUP = "riceLakeBoating"          # slug or numeric id
TARGET = 50                         # how many posts to collect
MAX_SCROLLS = 30

goto_url(f"https://www.facebook.com/groups/{GROUP}/?sorting_setting=CHRONOLOGICAL")
wait_for_load()
wait(2)

# Abort if FB bounced us
info = page_info()
if "/checkpoint/" in info["url"] or "/login" in info["url"]:
    sys.exit("AUTH_WALL — stop and have Jay re-verify the account.")

seen = {}
for _ in range(MAX_SCROLLS):
    batch = js("""
      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
        const link = el.querySelector('a[href*="/groups/"][href*="/posts/"], a[href*="/groups/"][href*="/permalink/"]');
        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
        const author = el.querySelector('h3 a, h4 a, strong a');
        const time = el.querySelector('abbr, a[role="link"] > span > span');
        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]')).map(a => a.href);
        return { url: link?.href, author: author?.innerText, time: time?.innerText,
                 body: body?.innerText?.slice(0, 4000), externals };
      }).filter(p => p.url)
    """) or []
    for p in batch:
        seen.setdefault(p["url"], p)
    if len(seen) >= TARGET:
        break
    scroll(640, 400, dy=900)
    wait(2.5)

def decode(u):
    if not u.startswith("https://l.facebook.com/l.php"): return u
    q = parse_qs(urlparse(u).query)
    return unquote(q["u"][0]) if "u" in q else u

posts = list(seen.values())
all_externals = sorted({decode(x) for p in posts for x in p["externals"]})
capture_screenshot(f"/tmp/fb-group-{GROUP}.png", full=True)
print(json.dumps({
    "group": GROUP,
    "post_count": len(posts),
    "posts": posts,
    "external_urls": all_externals,
}, ensure_ascii=False))
PY

The JSON on stdout is the handoff payload — parse it in the calling agent and route external_urls into the downstream extractor that matches the task (competitor inventory, pricing intel, boat listings, etc).

Gotchas log (append when you hit something new)

2026-04-18: Fresh install verified. People-search URL requires login; page search /search/pages/?q= works the same way. Groups feed defaults to algorithmic sort — always append ?sorting_setting=CHRONOLOGICAL.

mining a public Page's feed for posts + external URLs

pages.md

Source

Companion to groups.md. Most of the DOM surface is shared because FB renders post articles from the same React component in both contexts — the differences are the URL shapes, the sort options, and the rate-limit...
Requires: a real Chrome driven by Browser Harness. Logged-in is recommended but not strictly required — FB Pages are public. Logged-out sessions get more aggressive "see more" gating and an interstitial login prompt...
Pull the N most recent posts from a named FB Page (brand, publisher, local business)
Harvest every external URL the Page has linked out to

Show full markdown

Companion to groups.md. Most of the DOM surface is shared because FB renders post articles from the same React component in both contexts — the differences are the URL shapes, the sort options, and the rate-limit ceiling (Pages are public, so FB is a little more forgiving than in member-gated Groups).

Requires: a real Chrome driven by Browser Harness. Logged-in is recommended but not strictly required — FB Pages are public. Logged-out sessions get more aggressive "see more" gating and an interstitial login prompt that breaks the scroll loop after ~5 posts. Stay signed in.

What this skill is for

Pull the N most recent posts from a named FB Page (brand, publisher, local business)
Harvest every external URL the Page has linked out to
Grab Page metadata — follower count, category, website, verified status
Hand the outbound URL list to http_get or another downstream extractor

It is NOT for: leaving comments, reacting, messaging the Page, or any write action.

URL patterns

Pages can be addressed by either a vanity slug (/BoatingOntario.ca) or a numeric Page ID (/100064...). Vanity is more legible; numeric is more stable (vanities can be changed by the page owner).

What	URL
Page main feed (default tab)	`https://www.facebook.com/{vanity_or_id}`
Page Posts tab (canonical post feed)	`https://www.facebook.com/{vanity_or_id}/posts`
Page About	`https://www.facebook.com/{vanity_or_id}/about`
Page Reviews	`https://www.facebook.com/{vanity_or_id}/reviews`
Page Videos	`https://www.facebook.com/{vanity_or_id}/videos`
Page Events	`https://www.facebook.com/{vanity_or_id}/events`
Single post (vanity permalink)	`https://www.facebook.com/{vanity_or_id}/posts/pfbid{...}`
Single post (legacy permalink)	`https://www.facebook.com/permalink.php?story_fbid={story_id}&id={page_id}`
Single post (story permalink)	`https://www.facebook.com/story.php?story_fbid={story_id}&id={page_id}`
Page-search (find a Page by name)	`https://www.facebook.com/search/pages/?q={query}`

Unlike Groups, Pages do not support ?sorting_setting=CHRONOLOGICAL — the Posts tab is the closest thing to a chronological view, and it's reverse-chrono by default. Don't rely on perfect ordering: pinned posts always appear first, and FB occasionally reorders the top few based on engagement.

DOM anchors

Post-article anchors are the same as groups.md because the feed component is shared. Page-chrome anchors (header, about-rail, tabs) are specific to Pages.

Anchor	Selector	Notes
Page display name	`h1` (first on page)	Stable — FB has rendered Page name as the top-level `h1` for years
Verified badge	`h1 svg[aria-label*="Verified"]`	Present on verified Pages only
Follower/like count	`a[href$="/followers/"], a[href$="/friends_likes/"]`	Text node contains the count — parse with a regex
Category line	`div[role="main"] span:has(a[href*="/pages/category/"])`	Sits under the name in the header
Website link in header	`a[href^="https://l.facebook.com/l.php"][href*="u="]` inside the About rail	Same redirector wrapper as post links — decode before using
Each post container	`div[role="article"]`	Same as groups
Post permalink	`a[href="/posts/"][href="pfbid"], a[href="/permalink.php"], a[href="/story.php"]`	Page posts use `pfbid...` style or the legacy `permalink.php`/`story.php` shapes
Post body text	`div[data-ad-preview="message"], div[data-ad-comet-preview="message"]`	Same as groups
Post author	`h3 a, h4 a, strong a`	On a Page, this is always the Page itself — useful only for sanity checking you're still on the right Page
Post timestamp	`a[href*="/posts/"] abbr, a[role="link"] > span > span`	Hover returns absolute time; relative string is fine for sorting
External link (FB redirector)	`a[href^="https://l.facebook.com/l.php?u="]`	Decode the `u=` param
"See more" on long posts	`div[role="button"]:has(span:contains("See more"))`	Click before reading body or posts get truncated

If a selector stops returning results, run the self-inspection block at the bottom and update this table — that's the workflow, not a fallback.

Extracting Page metadata (header block)

Unlike a Group, a Page's header carries useful signal on its own — category, verified, follower count, website. Pull it in one JS call before you start scrolling the feed.

python

meta = js("""
  ({
    name: document.querySelector('h1')?.innerText || null,
    verified: !!document.querySelector('h1 svg[aria-label*="Verified"]'),
    followers: (Array.from(document.querySelectorAll('a'))
      .find(a => /followers$/.test(a.getAttribute('href')||''))?.innerText) || null,
    likes: (Array.from(document.querySelectorAll('a'))
      .find(a => /friends_likes$/.test(a.getAttribute('href')||''))?.innerText) || null,
    category: (Array.from(document.querySelectorAll('a[href*="/pages/category/"]'))[0]?.innerText) || null,
    website_redirector: (Array.from(document.querySelectorAll('a[href^="https://l.facebook.com/l.php"]'))
      .find(a => !a.closest('div[role="article"]'))?.href) || null,
  })
""")

Decode website_redirector with the same helper as post links (see below).

Scrolling the feed (lazy load)

Same collect-as-you-go pattern as groups. FB virtualizes the Page feed too — scrolled-past posts unmount, so scroll-then-collect loses them.

python

seen = {}  # permalink -> dict
TARGET = 50
MAX_SCROLLS = 30

for i in range(MAX_SCROLLS):
    batch = js("""
      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
        const link = el.querySelector('a[href*="/posts/"][href*="pfbid"], a[href*="/permalink.php"], a[href*="/story.php"]');
        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
        const time = el.querySelector('abbr, a[role="link"] > span > span');
        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]'))
          .map(a => a.href);
        return {
          url: link?.href || null,
          time: time?.innerText || null,
          body: body?.innerText?.slice(0, 4000) || null,
          externals: externals,
        };
      }).filter(p => p.url)
    """) or []
    for p in batch:
        seen.setdefault(p["url"], p)
    if len(seen) >= TARGET:
        break
    scroll(640, 400, dy=900)
    wait(2.5)

Notes:

Page feeds are usually less dense than active Group feeds — a slow Page may only render 8–15 posts total before you hit the footer. Use if len(batch) == 0 for two consecutive iterations as a stop condition.
Pinned posts re-appear at the top on every fresh load. The seen dict dedupes them naturally via permalink.

Decoding the external-URL redirector

Identical to groups.md — every outbound link is wrapped in https://l.facebook.com/l.php?u={URL-encoded real URL}&h=.... Strip the wrapper.

python

from urllib.parse import urlparse, parse_qs, unquote
def decode_fb_link(href):
    if not href.startswith("https://l.facebook.com/l.php"):
        return href
    q = parse_qs(urlparse(href).query)
    return unquote(q["u"][0]) if "u" in q else href

Handoff for outbound URLs

Same pattern as groups — Pages are the walled-garden surface that Harness is good at; the external URLs the Page has shared are public and better suited to ordinary HTTP clients or downstream extractors.

python

external_urls = sorted({decode_fb_link(x) for p in seen.values() for x in p["externals"]})
print(f"harvested {len(external_urls)} unique external URLs from Page")
# In the calling conversation:
#   send external_urls to the downstream extractor that matches the task schema

Rate-limit discipline

Pages are public, so the ceiling is higher than Groups — but the account-level detection still applies, because you're driving a real logged-in session.

≥2 seconds between scrolls inside the collect loop
≥2 seconds between Pages if you're sweeping multiple (down from 3s for Groups)
No more than ~12 Pages per hour for sustained monitoring (up from 6 Groups/hr)
Don't re-open the same Page within 10 minutes — repeated hits inside a short window is a heuristic that triggers soft-throttling even on public content

Symptoms of over-pacing: the "See more" links on long posts stop being clickable, the login interstitial appears even though you're signed in, or the URL silently redirects to /login/device-based/. If any of those fire, stop, let Jay look at the screen, and don't try to auto-resolve.

Self-inspection block (run when selectors stop working)

python

print(js("""
  ({
    articles: document.querySelectorAll('div[role="article"]').length,
    body_preview_a: document.querySelectorAll('div[data-ad-preview="message"]').length,
    body_preview_b: document.querySelectorAll('div[data-ad-comet-preview="message"]').length,
    external_redirectors: document.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]').length,
    pfbid_posts: document.querySelectorAll('a[href*="/posts/"][href*="pfbid"]').length,
    permalink_php: document.querySelectorAll('a[href*="/permalink.php"]').length,
    story_php: document.querySelectorAll('a[href*="/story.php"]').length,
    h1_present: !!document.querySelector('h1'),
  })
"""))
# If any count is 0 on a Page you know has posts, the selector drifted.
# Open DevTools, inspect a post, find the new stable attribute, update the
# DOM anchors table above.

Full example — mine one Page, emit JSON for downstream tools

bash

cd ~/Developer/browser-harness && uv run browser-harness <<'PY'
import json, sys
from urllib.parse import urlparse, parse_qs, unquote

PAGE = "BoatingOntario.ca"   # vanity slug OR numeric Page ID
TARGET = 30
MAX_SCROLLS = 25

goto_url(f"https://www.facebook.com/{PAGE}/posts")
wait_for_load()
wait(3)

info = page_info()
if "/checkpoint/" in info["url"] or "/login" in info["url"]:
    sys.exit("AUTH_WALL — stop and have the account re-verify.")

# Header metadata
meta = js("""
  ({
    name: document.querySelector('h1')?.innerText || null,
    verified: !!document.querySelector('h1 svg[aria-label*="Verified"]'),
    category: (Array.from(document.querySelectorAll('a[href*="/pages/category/"]'))[0]?.innerText) || null,
    followers: (Array.from(document.querySelectorAll('a'))
      .find(a => /followers$/.test(a.getAttribute('href')||''))?.innerText) || null,
    website_redirector: (Array.from(document.querySelectorAll('a[href^="https://l.facebook.com/l.php"]'))
      .find(a => !a.closest('div[role="article"]'))?.href) || null,
  })
""")

# Feed sweep
seen = {}
empty_streak = 0
for _ in range(MAX_SCROLLS):
    batch = js("""
      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
        const link = el.querySelector('a[href*="/posts/"][href*="pfbid"], a[href*="/permalink.php"], a[href*="/story.php"]');
        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
        const time = el.querySelector('abbr, a[role="link"] > span > span');
        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]')).map(a => a.href);
        return { url: link?.href, time: time?.innerText,
                 body: body?.innerText?.slice(0, 4000), externals };
      }).filter(p => p.url)
    """) or []
    before = len(seen)
    for p in batch:
        seen.setdefault(p["url"], p)
    empty_streak = empty_streak + 1 if len(seen) == before else 0
    if len(seen) >= TARGET or empty_streak >= 2:
        break
    scroll(640, 400, dy=900)
    wait(2.5)

def decode(u):
    if not u.startswith("https://l.facebook.com/l.php"): return u
    q = parse_qs(urlparse(u).query)
    return unquote(q["u"][0]) if "u" in q else u

posts = list(seen.values())
if meta.get("website_redirector"):
    meta["website"] = decode(meta["website_redirector"])
all_externals = sorted({decode(x) for p in posts for x in p["externals"]})
capture_screenshot(f"/tmp/fb-page-{PAGE}.png", full=True)
print(json.dumps({
    "page": PAGE,
    "meta": meta,
    "post_count": len(posts),
    "posts": posts,
    "external_urls": all_externals,
}, ensure_ascii=False))
PY

The stdout JSON is the handoff payload — parse it in the calling agent and route external_urls into a downstream extractor, route meta into a competitor-intel table, or feed posts into keyword matching.

When to reach for pages.md vs groups.md

If the URL is...	Use
`facebook.com/groups/{id_or_slug}`	`groups.md`
`facebook.com/{vanity}` or `facebook.com/{numeric_id}`	`pages.md`
`facebook.com/profile.php?id={id}`	neither — that's a personal profile, different DOM and much stricter rate limits
`facebook.com/marketplace/...`	neither — dedicated Marketplace skill needed

A quick way to tell Pages from personal profiles when the URL shape is ambiguous: Pages have an h1 with a verified-badge slot and a category link underneath; personal profiles have a cover photo component and a "Friends" tab.

Gotchas log (append when you hit something new)

Initial version: Post-article selectors inherited from groups.md because FB renders the feed article component identically across Group and Page contexts. Run the self-inspection block on first live use to confirm no drift since the groups.md verification date, and append a note here with what you found.