← Back to skills

Domain skill

soundcloud

Markdown synced from browser-harness domain skills.

Host
soundcloud
Files
1

Agent prompt

Use this skill

Copy this prompt into your coding agent to make it enable browser-harness domain skills and read this exact domain folder before automating.

Set up https://github.com/browser-use/browser-harness for me if it is not already installed. If setup is needed, read `install.md` first to install and connect it to my real browser. Then read `SKILL.md` for normal usage and always read `helpers.py` because that is where the browser-harness functions are.

Enable domain skills if they are not already enabled by setting `BH_DOMAIN_SKILLS=1` for browser-harness. Use the `soundcloud` domain skill from `agent-workspace/domain-skills/soundcloud/`. Read every markdown file for this domain before inventing an approach:
- agent-workspace/domain-skills/soundcloud/scraping.md

Use those domain-skill notes to complete my task for `soundcloud` in my real browser. When you open a setup, verification, or task tab, activate it so I can see the active browser tab.

Skill contents

What the agent will read

Data Extraction

scraping.md

Source
  • Field-tested against soundcloud.com on 2026-04-18. No authentication required for any approach documented here. All code uses httpget (pure HTTP, no browser).
  • ---
  • https://soundcloud.com/oembed?url=<resourceurl>&format=json
  • Returns JSON in 0.3s. Works for tracks, playlists/sets, and user profiles. No key required.
Show full markdown

Field-tested against soundcloud.com on 2026-04-18. No authentication required for any approach documented here. All code uses http_get (pure HTTP, no browser).


Approach 1 (Fastest): oEmbed API — No Auth, No Client ID

https://soundcloud.com/oembed?url=<resource_url>&format=json

Returns JSON in ~0.3s. Works for tracks, playlists/sets, and user profiles. No key required.

python
from helpers import http_get
import json

def soundcloud_oembed(resource_url):
    """Fetch oEmbed metadata for any public SoundCloud URL.

    Works for:
      - https://soundcloud.com/{user}/{track-slug}
      - https://soundcloud.com/{user}/sets/{playlist-slug}
      - https://soundcloud.com/{user}
    """
    url = f"https://soundcloud.com/oembed?url={resource_url}&format=json"
    return json.loads(http_get(url))

# Track
track = soundcloud_oembed("https://soundcloud.com/forss/flickermood")
# {
#   "version": 1.0,
#   "type": "rich",
#   "provider_name": "SoundCloud",
#   "provider_url": "https://soundcloud.com",
#   "height": 400,
#   "width": "100%",
#   "title": "Flickermood by Forss",
#   "description": "From the Soulhack album...",
#   "thumbnail_url": "https://i1.sndcdn.com/artworks-000067273316-smsiqx-t500x500.jpg",
#   "html": "<iframe width=\"100%\" height=\"400\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?visual=true&url=...\">",
#   "author_name": "Forss",
#   "author_url": "https://soundcloud.com/forss"
# }

# Playlist/set
pl = soundcloud_oembed("https://soundcloud.com/forss/sets/soulhack")
# title="Soulhack by Forss", description="My 2003 debut album...", height=450

# User profile
user = soundcloud_oembed("https://soundcloud.com/forss")
# title="Forss", description="Artist & Founder SoundCloud", height=450

oEmbed fields

FieldTypeNotes
titlestr"{Track Title} by {Artist}" for tracks, "{Name}" for users
author_namestrArtist/user display name
author_urlstrProfile URL
thumbnail_urlstrArtwork at 500×500px (t500x500)
descriptionstrTrack/profile description (may contain HTML entities)
htmlstrEmbed iframe for the SoundCloud player widget
heightint400 for tracks, 450 for playlists and users
widthstrAlways "100%"

Approach 2: Page Hydration (__sc_hydration) — Rich Metadata, No Client ID

Every SoundCloud page embeds a JSON array in a <script> tag as window.__sc_hydration. This contains full API-grade metadata with no key required.

python
from helpers import http_get
import json, re

def extract_hydration(page_url):
    """Extract __sc_hydration JSON from any SoundCloud page."""
    html = http_get(page_url)
    match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
    if not match:
        return []
    return json.loads(match.group(1))

def get_hydration_by_type(page_url, hydratable):
    """Get the 'data' dict for a specific hydratable type."""
    for obj in extract_hydration(page_url):
        if obj.get('hydratable') == hydratable:
            return obj.get('data')
    return None

# Track page — hydration key is 'sound'
track = get_hydration_by_type("https://soundcloud.com/forss/flickermood", "sound")
# track['id']             = 293
# track['title']          = "Flickermood"
# track['playback_count'] = 962685
# track['likes_count']    = 2592
# track['duration']       = 213886  (milliseconds)
# track['genre']          = "Electronic"
# track['created_at']     = "2007-09-22T14:45:46Z"
# track['artwork_url']    = "https://i1.sndcdn.com/artworks-000067273316-smsiqx-large.jpg"
# track['waveform_url']   = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
# track['streamable']     = True
# track['downloadable']   = True
# track['license']        = "all-rights-reserved"
# track['tag_list']       = "downtempo"
# track['urn']            = "soundcloud:tracks:293"
# track['media']          = {'transcodings': [...]}  (HLS/progressive stream URLs — need auth)
# track['user']           = {full user object nested}

# User page — hydration key is 'user'
user = get_hydration_by_type("https://soundcloud.com/forss", "user")
# user['id']               = 183
# user['username']         = "Forss"
# user['full_name']        = "Eric Quidenus-Wahlforss"
# user['followers_count']  = 132203
# user['track_count']      = 26
# user['verified']         = True
# user['city']             = "Berlin"
# user['country_code']     = "DE"
# user['description']      = "Artist & Founder SoundCloud"
# user['creator_subscription'] = {'product': {'id': 'creator-pro-unlimited'}}
# user['badges']           = {'pro_unlimited': True, 'verified': True}

# Playlist/set page — hydration key is 'playlist'
playlist = get_hydration_by_type("https://soundcloud.com/forss/sets/soulhack", "playlist")
# playlist['id']           = 18
# playlist['title']        = "Soulhack"
# playlist['track_count']  = 11
# playlist['tracks']       = [full track objects list]
# playlist['is_album']     = True/False
# playlist['genre']        = "Electronic"

All hydration keys on a typical page

hydratableContent
soundFull track object (on track pages)
playlistFull playlist + all tracks (on set pages)
userFull user object (on any page with a profile)
apiClient{'id': '<client_id>', 'isExpiring': False} — the client_id
geoipViewer country/city/coordinates
featuresFeature flags dict
anonymousIdSession tracking ID (not useful)

Approach 3: API v2 — Full Query Power (Requires Client ID)

The client_id lives in every page's __sc_hydration under the apiClient key. It is stable across all pages and sessions — extract once and reuse.

python
from helpers import http_get
import json, re

def get_client_id(page_url="https://soundcloud.com"):
    """Extract client_id from any SoundCloud page's __sc_hydration."""
    html = http_get(page_url)
    match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
    if not match:
        raise ValueError("No hydration found")
    for obj in json.loads(match.group(1)):
        if obj.get('hydratable') == 'apiClient':
            return obj['data']['id']
    raise ValueError("apiClient not found in hydration")

CLIENT_ID = get_client_id()  # "efg2kjLJnAJpInbN6P3hsHzispI1SKQH" (example — extract fresh)

def sc_api(path, **params):
    """Call api-v2.soundcloud.com. Returns parsed JSON."""
    params['client_id'] = CLIENT_ID
    qs = "&".join(f"{k}={v}" for k, v in params.items())
    return json.loads(http_get(f"https://api-v2.soundcloud.com/{path}?{qs}"))

Resolve any URL to a resource

python
# Resolve a permalink URL to get its resource with full metadata
track = sc_api("resolve", url="https://soundcloud.com/forss/flickermood")
# Returns: {'kind': 'track', 'id': 293, 'title': 'Flickermood', ...}

user = sc_api("resolve", url="https://soundcloud.com/forss")
# Returns: {'kind': 'user', 'id': 183, 'username': 'Forss', ...}

Track lookup

python
# Single track by numeric ID
track = sc_api("tracks/293")

# Bulk track lookup (comma-separated IDs — returns list)
tracks = sc_api("tracks", ids="293,290,48031525")
# Returns a JSON array directly (not wrapped in 'collection')
for t in tracks:
    print(t['id'], t['title'], t['playback_count'])

Search

python
# Tracks
results = sc_api("search/tracks", q="jazz", limit=20)
# results['collection']    = list of track objects
# results['total_results'] = 5293248
# results['next_href']     = pagination URL (see below)

# Users
results = sc_api("search/users", q="jazz", limit=10)

# Playlists/sets
results = sc_api("search/playlists", q="jazz", limit=10)

# Paginate with next_href
def paginate(first_response):
    """Yield all pages of a collection response."""
    yield from first_response.get('collection', [])
    next_href = first_response.get('next_href')
    while next_href:
        page = json.loads(http_get(f"{next_href}&client_id={CLIENT_ID}"))
        yield from page.get('collection', [])
        next_href = page.get('next_href')

Trending charts

python
# Trending tracks across all genres
trending = sc_api("charts", kind="trending",
                  genre="soundcloud:genres:all-music", limit=20)
for item in trending['collection']:
    t = item['track']
    print(f"{t['title']} — score={item['score']:.4f}")

# Genre options: soundcloud:genres:all-music, soundcloud:genres:electronic,
#                soundcloud:genres:hiphoprap, soundcloud:genres:ambient, etc.

User resources

python
user_id = 183  # numeric ID from resolve or hydration

# User's tracks
tracks = sc_api(f"users/{user_id}/tracks", limit=20)
# tracks['collection'] = list of track objects

# User's playlists
playlists = sc_api(f"users/{user_id}/playlists", limit=10)

# User's likes
likes = sc_api(f"users/{user_id}/likes", limit=10)

# Related tracks for a track
related = sc_api("tracks/293/related", limit=10)
# related['collection'] = list of track objects

Waveform data

python
# Waveform URL comes from track['waveform_url']
waveform_url = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
waveform = json.loads(http_get(waveform_url))
# {
#   'width': 1800,   # number of sample points
#   'height': 140,   # max amplitude value
#   'samples': [11, 86, 91, 80, ...]  # 1800 amplitude values
# }

Full track fields from __sc_hydration / API v2

code
id               int     Numeric track ID (e.g. 293)
urn              str     "soundcloud:tracks:293"
title            str     Track title
description      str     May contain HTML entities/tags
genre            str     Genre string
tag_list         str     Space-separated tags
created_at       str     ISO 8601 UTC
last_modified    str     ISO 8601 UTC
release_date     str     ISO 8601 UTC (original release)
display_date     str     ISO 8601 UTC (shown to users)
duration         int     Milliseconds
full_duration    int     Milliseconds (untruncated)
playback_count   int
likes_count      int
reposts_count    int
comment_count    int
download_count   int
artwork_url      str     e.g. .../artworks-...-large.jpg (replace 'large' with 't500x500' for 500px)
waveform_url     str     https://wave.sndcdn.com/....json
permalink        str     Slug (e.g. "flickermood")
permalink_url    str     Full canonical URL
streamable       bool
downloadable     bool
license          str     e.g. "all-rights-reserved", "cc-by"
sharing          str     "public" or "private"
state            str     "finished" | "processing" | "failed"
monetization_model str   "AD_SUPPORTED" | "SUB_HIGH_TIER" | "NOT_APPLICABLE"
embeddable_by    str     "all" | "me" | "none"
user             dict    Nested user object (id, username, avatar_url, verified, ...)
user_id          int     Owner numeric ID
publisher_metadata dict  {artist, publisher, isrc, contains_music, ...}
media            dict    {'transcodings': [...]}  — stream URLs (require OAuth, not usable without login)
label_name       str     Record label
purchase_url     str     External buy link
station_urn      str     "soundcloud:system-playlists:track-stations:{id}"

Gotchas

client_id is required for api-v2.soundcloud.com — requests without it return HTTP 401. Always extract from __sc_hydration['apiClient']['id'].

client_id source: hydration, not JS bundles — the JS bundles on a-v2.sndcdn.com do NOT contain the client_id pattern. The only reliable source is the apiClient object in the page hydration. It is stable across all pages (same value from homepage, track pages, user pages) and does not appear to rotate on short timescales.

Artwork URL sizes — hydration/API returns ...-large.jpg (100×100). Replace the size suffix to get larger images:

  • -large.jpg → 100×100
  • -t300x300.jpg → 300×300
  • -t500x500.jpg → 500×500 (oEmbed returns this size)

Regex must use re.DOTALL — the __sc_hydration JSON spans multiple lines. Without re.DOTALL, the . in the regex won't match newlines.

Stream URLs (media.transcodings) are gated — the HLS/progressive audio stream URLs in track['media']['transcodings'] require an OAuth token even to fetch a stream manifest. They cannot be played without a logged-in session.

Bulk track lookup returns a list, not collectionGET /tracks?ids=... returns a JSON array directly. Do NOT look for .get('collection').

Search total_results can be huge — results like 5M+ are normal for broad queries. Use next_href for pagination; do not calculate offsets manually.

oEmbed description contains HTML — SoundCloud descriptions may include &nbsp; and anchor tags. Decode with html.unescape() if you need plain text.

HTTP 400 on some endpoints/tracks/{id}/comments returns 400 without OAuth headers. Timed comments are not accessible without login.

No browser required — all documented approaches work with plain http_get. SoundCloud does not require JavaScript rendering for metadata extraction.

Rate limits — 20 rapid sequential API v2 requests completed without errors in testing. SoundCloud does not publish official rate limits; stay under ~50 req/s for sustained scraping. oEmbed is more lenient than api-v2.


Quick Reference

GoalApproachAuth
Track title/author/thumbnail from URLoEmbedNone
Full track metadata + play counts__sc_hydration sound keyNone
Full user profile + stats__sc_hydration user keyNone
Full playlist with all tracks__sc_hydration playlist keyNone
Search tracks/users/playlistsAPI v2 /search/*client_id
Trending chartsAPI v2 /chartsclient_id
Bulk track lookup by IDsAPI v2 /tracks?ids=client_id
User's track listAPI v2 /users/{id}/tracksclient_id
Resolve permalink to resourceAPI v2 /resolve?url=client_id
Waveform amplitude dataDirect fetch of waveform_urlNone
Audio stream playbackOAuth login requiredLogin