Domain skill
musicbrainz
Markdown synced from browser-harness domain skills.
- Host
- musicbrainz
- Files
- 1
Agent prompt
Use this skill
Copy this prompt into your coding agent to make it enable browser-harness domain skills and read this exact domain folder before automating.
Set up https://github.com/browser-use/browser-harness for me if it is not already installed. If setup is needed, read `install.md` first to install and connect it to my real browser. Then read `SKILL.md` for normal usage and always read `helpers.py` because that is where the browser-harness functions are. Enable domain skills if they are not already enabled by setting `BH_DOMAIN_SKILLS=1` for browser-harness. Use the `musicbrainz` domain skill from `agent-workspace/domain-skills/musicbrainz/`. Read every markdown file for this domain before inventing an approach: - agent-workspace/domain-skills/musicbrainz/scraping.md Use those domain-skill notes to complete my task for `musicbrainz` in my real browser. When you open a setup, verification, or task tab, activate it so I can see the active browser tab.
Skill contents
What the agent will read
Data Extraction
scraping.md
- https://musicbrainz.org — open music encyclopedia with a fully free JSON API. No auth required for reads. No browser needed for any documented workflow.
- Field-tested against musicbrainz.org on 2026-04-18.
- ---
- The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.
Show full markdown
https://musicbrainz.org — open music encyclopedia with a fully free JSON API.
No auth required for reads. No browser needed for any documented workflow.
Field-tested against musicbrainz.org on 2026-04-18.
Do this first
The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.
from helpers import http_get
import json
# REQUIRED: every request must include this header or you get HTTP 403
UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
data = json.loads(http_get("https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5", headers=UA))
for a in data['artists']:
print(a['id'], a['name'], a.get('type'), a.get('country'), a['score'])
# 0383dadf-2a4e-4d10-a46a-e9e041da8eb3 Queen Group GB 100
# 79239441-bfd5-4981-a70c-55c3f15c1287 Madonna Person US 73
User-Agent is mandatory — omitting it returns HTTP 403 immediately. Format: AppName/Version (contact@email.com).
Entity types
| Entity | Endpoint | Key fields |
|---|---|---|
artist | /ws/2/artist/ | name, sort-name, type (Group/Person/Orchestra/Choir), country, life-span, tags, rating |
release-group | /ws/2/release-group/ | title, primary-type (Album/Single/EP/Other), first-release-date |
release | /ws/2/release/ | title, date, country, status (Official/Bootleg/Promotional), barcode, label-info, media |
recording | /ws/2/recording/ | title, length (milliseconds), artist-credit, releases |
label | /ws/2/label/ | name, type, country, area |
work | /ws/2/work/ | title, type (Song/Aria/Soundtrack/etc.), relations |
All entities share the same MBID (MusicBrainz ID) format: UUID v4, e.g. 0383dadf-2a4e-4d10-a46a-e9e041da8eb3.
Common workflows
Artist search
from helpers import http_get
import json
UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5",
headers=UA
))
# resp keys: count (total matches), offset, artists (list)
for a in resp['artists']:
print(a['id']) # MBID: 0383dadf-2a4e-4d10-a46a-e9e041da8eb3
print(a['name']) # Queen
print(a['sort-name']) # Queen (differs for persons: "Bowie, David")
print(a.get('type')) # Group / Person / Orchestra / Choir
print(a.get('country')) # GB
print(a.get('life-span'))# {'begin': '1970-06-27', 'end': None, 'ended': True}
print(a.get('disambiguation', '')) # e.g. "English singer-songwriter"
print(a['score']) # relevance 0-100
Artist by MBID (with related data via inc=)
# inc= parameters stack with + between them
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/artist/0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
"?inc=releases+tags+ratings+release-groups&fmt=json",
headers=UA
))
print(resp['name']) # Queen
print(resp['type']) # Group
print(resp['country']) # GB
print(resp['life-span']) # {'begin': '1970-06-27', 'end': None, 'ended': True}
# Tags (community-voted genre labels, sorted by count)
tags = sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)
print([t['name'] for t in tags[:5]])
# ['rock', 'glam rock', 'hard rock', 'art rock', 'british']
# Rating (community score, 0-5)
print(resp.get('rating')) # {'votes-count': 43, 'value': 4.7}
# Direct releases (up to 25 per request — use browse for full list)
for r in resp.get('releases', []):
print(r['id'], r['title'], r.get('date'))
# Release groups (albums, singles, EPs — deduplicated by edition)
for rg in resp.get('release-groups', []):
print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
# 6b47c9a0 A Night at the Opera Album 1975-11-21
# 002ed683 Sheer Heart Attack Album 1974-11-01
Browse releases by artist (full list)
# Browse API: uses 'artist' param (not 'query') — response key is 'release-count' not 'count'
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/release/"
"?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25&offset=0",
headers=UA
))
print(resp['release-count']) # 1635 — total releases for this artist
for r in resp['releases']:
print(r['id'], r['title'], r.get('date'), r.get('country'), r.get('status'))
# Also has: cover-art-archive.artwork (bool), cover-art-archive.front (bool)
caa = r.get('cover-art-archive', {})
print(caa.get('artwork'), caa.get('front'), caa.get('count'))
# Paginate: increment offset by limit
Release search and lookup
# Search by title
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/release/?query=dark+side+of+the+moon&fmt=json&limit=5",
headers=UA
))
# resp keys: count, offset, releases
# Full release with track list, artists, and labels
release = json.loads(http_get(
"https://musicbrainz.org/ws/2/release/b84ee12a-09ef-421b-82de-0441a926375b"
"?inc=artists+recordings+labels+release-groups&fmt=json",
headers=UA
))
print(release['title']) # The Dark Side of the Moon
print(release['date']) # 1973-03-24
print(release['status']) # Official
print(release['country']) # GB
# Release group (the "album concept", deduplicates editions)
rg = release.get('release-group', {})
print(rg['title'], rg.get('primary-type'), rg['id'])
# The Dark Side of the Moon Album f5093c06-23e3-404f-aeaa-40f72885ee3a
# Artist credit
for ac in release.get('artist-credit', []):
if isinstance(ac, dict) and 'artist' in ac:
print(ac['artist']['name'], ac['artist']['id'])
# Pink Floyd 83d91898-7763-47d7-b03b-b92132375c47
# Labels
for li in release.get('label-info', []):
label = li.get('label', {})
print(label.get('name'), li.get('catalog-number'))
# Harvest SHVL 804
# Track list (from media[].tracks[])
for disc in release.get('media', []):
for track in disc.get('tracks', []):
dur_s = track['length'] // 1000 if track.get('length') else None
rec = track.get('recording', {})
print(track['number'], track['title'], dur_s, rec.get('id'))
# A1 Speak to Me 68s bef3fddb-5aca-49f5-b2fd-d56a23268d63
# A2 Breathe 168s ecbc7c9b-e79d-4ec8-ac77-44e4a7f7f1b8
Recording (track) search
# Use Lucene field syntax to filter by artist
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/recording/"
"?query=bohemian+rhapsody+AND+artist:queen&fmt=json&limit=5",
headers=UA
))
print(resp['count']) # 419
for r in resp['recordings']:
dur_s = r['length'] // 1000 if r.get('length') else None
artists = [ac['artist']['name'] for ac in r.get('artist-credit', []) if isinstance(ac, dict)]
releases = r.get('releases', [])
print(r['id'], r['title'], dur_s, artists, releases[0]['title'] if releases else None)
# a4803b45 Bohemian Rhapsody 130s ['Queen'] Rhapsody in Red
# 40212eb6 Bohemian Rhapsody 338s ['Queen'] 1986-07: Wembley Stadium
Release-group search (deduplicated albums)
# Use release-group endpoint to avoid getting every regional edition
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/release-group/"
"?query=release-group:\"A+Night+at+the+Opera\"+AND+artist:queen&fmt=json&limit=5",
headers=UA
))
# resp keys: count, release-groups
for rg in resp.get('release-groups', []):
print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'), rg['score'])
# 6b47c9a0 A Night at the Opera Album 1975-11-21 100
# Browse release-groups for an artist
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/release-group/"
"?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25",
headers=UA
))
print(resp['release-group-count']) # 412
for rg in resp.get('release-groups', []):
print(rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
Label and work lookups
# Label search
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/label/?query=EMI&fmt=json&limit=3",
headers=UA
))
for l in resp['labels']:
print(l['id'], l['name'], l.get('type'), l.get('country'), l['score'])
# c029628b EMI Original Production GB 100
# Work (song composition — author-level, not performance-level)
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/work/?query=bohemian+rhapsody&fmt=json&limit=3",
headers=UA
))
for w in resp['works']:
print(w['id'], w['title'], w.get('type'), w['score'])
# 41c94a08 Bohemian Rhapsody Song 100
Cover Art Archive
# Get cover art for a release MBID
# 404 if no artwork has been uploaded for that release
def get_cover_art(release_mbid, size="500"):
"""
size: '250', '500', '1200', or 'full' (original file)
Returns the front cover URL, or None if no artwork exists.
"""
try:
resp = json.loads(http_get(
f"https://coverartarchive.org/release/{release_mbid}",
headers=UA
))
except Exception:
return None # 404 = no art uploaded
images = resp.get('images', [])
# Prefer an image flagged as front=True
front = next((img for img in images if img.get('front')), None)
img = front or (images[0] if images else None)
if not img:
return None
if size == 'full':
return img['image']
return img['thumbnails'].get(size) or img['thumbnails'].get('large')
# Thumbnail sizes confirmed: '250', '500', '1200', 'small' (=250), 'large' (=500)
url = get_cover_art("b84ee12a-09ef-421b-82de-0441a926375b")
# http://coverartarchive.org/release/b84ee12a.../1611507818-500.jpg
# Full images response structure
resp = json.loads(http_get(
"https://coverartarchive.org/release/b84ee12a-09ef-421b-82de-0441a926375b",
headers=UA
))
for img in resp['images']:
print(img.get('types')) # ['Front'], ['Back'], ['Liner'], ['Poster'], ['Medium'], ['Sticker'], ['Other']
print(img.get('front')) # True only for front=True flagged images (not all 'Front' types)
print(img.get('approved'))# True/False
print(img['image']) # full resolution URL
print(img['thumbnails']) # {'small': '...-250.jpg', 'large': '...-500.jpg', '250': ..., '500': ..., '1200': ...}
Lucene query syntax for search
All search endpoints support Lucene field queries:
# Field search: artist:, type:, country:, tag:, release:, date:
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/artist/"
"?query=artist:queen+AND+type:group+AND+country:GB&fmt=json&limit=5",
headers=UA
))
# count: 23 (exact matches only)
# Phrase search with quotes
resp = json.loads(http_get(
"https://musicbrainz.org/ws/2/release/"
'?query=release:"A+Night+at+the+Opera"+AND+artist:queen&fmt=json&limit=5',
headers=UA
))
Common Lucene field names per entity:
- artist:
artist:,type:,country:,tag:,begin:,end: - release:
release:,artist:,date:,country:,status:,label:,barcode: - recording:
recording:,artist:,release:,dur:(milliseconds),tnum:(track number) - release-group:
release-group:,artist:,primarytype:,secondarytype:
Parallel fetching
from concurrent.futures import ThreadPoolExecutor
UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
def fetch_artist(mbid):
resp = json.loads(http_get(
f"https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags&fmt=json",
headers=UA
))
tags = [t['name'] for t in sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)[:3]]
return {"name": resp['name'], "type": resp.get('type'), "tags": tags}
mbids = [
"0383dadf-2a4e-4d10-a46a-e9e041da8eb3", # Queen
"83d91898-7763-47d7-b03b-b92132375c47", # Pink Floyd
"678d88b2-87b0-403b-b63d-5da7465aecc3", # Led Zeppelin
]
with ThreadPoolExecutor(max_workers=3) as ex:
results = list(ex.map(fetch_artist, mbids))
# 3 artists fetched in ~0.79s total
Tested: 5-6 rapid sequential requests all succeed. Parallel requests at 3x concurrency succeed. Real 429s (rate-limit blocks) are only hit at very high burst rates; if you do get a 429, add time.sleep(1) between requests.
Pagination
import time
UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
def browse_all_releases(artist_mbid, page_size=25):
"""Fetch all releases for an artist across multiple pages."""
offset = 0
total = None
releases = []
while total is None or offset < total:
resp = json.loads(http_get(
f"https://musicbrainz.org/ws/2/release/"
f"?artist={artist_mbid}&fmt=json&limit={page_size}&offset={offset}",
headers=UA
))
total = resp['release-count']
batch = resp['releases']
releases.extend(batch)
offset += len(batch)
if offset < total:
time.sleep(1) # stay within 1 req/s for sequential pagination
return releases
# Queen has 1635 releases — use release-groups (412) to get deduplicated albums
inc= parameter reference
Stack multiple inc= values with + between them.
Artist lookup (/ws/2/artist/{mbid}):
releases— list of releases (max ~25)release-groups— list of release groups (max ~25)recordings— list of recordings (max ~25)works— list of workstags— community genre tags (name + vote count)ratings— community rating (value 0-5, votes-count)aliases— alternative names and transliterationsannotation— free-text editorial noteartist-rels,release-rels,recording-rels,work-rels— relationship data
Release lookup (/ws/2/release/{mbid}):
artists— full artist-credit objectsrecordings— track list with recording links (populatesmedia[].tracks[].recording)labels— label-info with catalog numbersrelease-groups— the release group this belongs toartist-credits— expanded artist credit with joinphrasemedia— disc/format info (always included in lookup, not needed ininc=)
Response shapes cheat sheet
# MBID format: standard UUID v4
"0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
# Search response (artist/recording/release/release-group/label/work)
{
"count": 1612, # total matches
"offset": 0,
"<entity-plural>": [...] # e.g. "artists", "releases", "recordings", "release-groups"
}
# Browse response (using ?artist=MBID or ?label=MBID style)
{
"release-count": 1635, # note: key name changes per entity
"release-offset": 0, # e.g. "release-group-count", "recording-count"
"releases": [...]
}
# Recording length is always milliseconds
recording['length'] // 1000 # => seconds
# Artist life-span
life_span = artist['life-span']
# {'begin': '1970-06-27', 'end': None, 'ended': True}
# 'ended': True with 'end': None means end date unknown but band is inactive
# Artist credit joinphrase (for multi-artist tracks)
# [{"name": "Simon", "artist": {...}, "joinphrase": " & "}, {"name": "Garfunkel", ...}]
URL patterns
| Resource | URL |
|---|---|
| Artist search | https://musicbrainz.org/ws/2/artist/?query={q}&fmt=json&limit=5 |
| Artist by MBID | https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags+ratings&fmt=json |
| Browse releases by artist | https://musicbrainz.org/ws/2/release/?artist={mbid}&fmt=json&limit=25&offset=0 |
| Release search | https://musicbrainz.org/ws/2/release/?query={q}&fmt=json&limit=5 |
| Release by MBID | https://musicbrainz.org/ws/2/release/{mbid}?inc=artists+recordings+labels&fmt=json |
| Release-group browse | https://musicbrainz.org/ws/2/release-group/?artist={mbid}&fmt=json&limit=25 |
| Recording search | https://musicbrainz.org/ws/2/recording/?query={q}&fmt=json&limit=5 |
| Label search | https://musicbrainz.org/ws/2/label/?query={q}&fmt=json&limit=5 |
| Work search | https://musicbrainz.org/ws/2/work/?query={q}&fmt=json&limit=5 |
| Cover art | https://coverartarchive.org/release/{release-mbid} |
MusicBrainz entity browser URL (human-readable): https://musicbrainz.org/artist/{mbid} (replace artist with release, recording, etc.)
Gotchas
-
User-Agentis mandatory — without it you get HTTP 403 instantly. The header must include contact info, e.g.browser-harness/1.0 (you@example.com). The defaulthttp_getUA (Mozilla/5.0) also gets 403. -
Browse vs search response keys differ — Search responses use
countandoffset; Browse responses (with?artist=MBID) userelease-count/release-offset(orrelease-group-countetc.). Accessingdata['count']on a browse response throwsKeyError. -
releasesinclude in artist lookup caps at ~25 — Use the browse endpoint (?artist=MBID) with pagination for complete lists. Queen has 1,635 releases total; theinc=releaseson the artist endpoint only returns ~25. -
Use release-groups to avoid edition explosion — A popular album can have hundreds of release entries (every country's pressing, every remaster, every format). Use
/ws/2/release-group/to get one entry per "album concept". Queen's "A Night at the Opera" has 75+ release entries but 1 release-group. -
Recording length is milliseconds —
recording['length']is in milliseconds, not seconds. Divide by 1000. -
Sort-name differs from display name for persons — Artists have both
name(display: "David Bowie") andsort-name(alphabetical: "Bowie, David"). Groups usually have identical values. -
Disambiguation in parentheses — When multiple entities share a name, MusicBrainz adds a
disambiguationfield to distinguish them (e.g."English singer-songwriter"vs a different David Bowie). Always checka.get('disambiguation', '')when resolving artist identity. -
Score 100 does not mean unique — Search returns
score: 100for multiple results when several equally match the query. "dark side of the moon" returns 6 results all scored 100 — they're different regional pressings. Filter bydate,country, orstatusto narrow down. -
Recording search: plain query matches titles AND artists broadly —
?query=bohemian+rhapsody+queenmatches cover versions first because "queen" appears in the artist or title of other recordings. UseAND artist:queenLucene syntax to restrict to Queen performances. -
Cover Art Archive returns 404 for releases with no uploaded art — Check
release['cover-art-archive']['artwork'](boolean) from any release browse/search response before hitting the CAA endpoint. Saves an extra HTTP round-trip. -
Cover art
front=Trueflag vstypes=['Front']— A release can have multiple images typed as 'Front' but only one (or none) flaggedfront: true. Always filter onimg.get('front') == Truefor the canonical cover, not onimg.get('types') == ['Front']. -
CAA thumbnail key names — Both string keys
'small'(250px) and'large'(500px) exist as aliases alongside numeric string keys'250','500','1200'. Access asimg['thumbnails']['500']orimg['thumbnails']['large']— both work. -
Rate limit: 1 req/s unauthenticated — In practice, bursts of 5-6 sequential requests succeed without throttling. True 429s appear at higher rates. For sequential pagination loops, add
time.sleep(1)between pages. For parallel fetching, limit concurrency to 3-5 workers. -
fmt=jsonrequired — Omitting it returns XML instead of JSON. Always append&fmt=jsonto every request.