Guide · Published 2026-05-22
How to Extend AI Music Clips with an API
Chain extend_music calls to take a 60-second AI clip past the 240s per-task ceiling. starts_at, length, instruction, and the chaining pattern explained with working code.
What "extend" means in an AI music context
AI music models have a per-task duration ceiling. On Google Lyria 3 Pro (the model behind MusicAPI's Producer API), a single create call can produce up to 240 seconds of audio. Beyond that, you need to extend.
The extend_music operation takes an existing clip and continues it forward from a specified timestamp. The result is a new audio segment that picks up where you tell it and runs for the length you request. Chain multiple extends to build full-length tracks from shorter clips.
Three common motivations developers use extend for:
- Hook-to-full-song development: generate a great 60-second hook, then extend with instructions like "build to a chorus" or "add a bridge and outro."
- Per-section iteration: generate verse 1, extend to chorus 1, extend to verse 2, etc. Each section is a separate extend with its own instruction.
- Reaching longer durations: chain 3-5 extends to produce 4+ minute tracks past the 240s single-task ceiling.
The extend_music endpoint at a glance
On MusicAPI's Producer API (Google Lyria 3 Pro), the endpoint is POST /api/v1/producer/create with task_type: "extend_music". The four parameters that matter:
clip_id(required): the source clip to extend. From a previous create_music, extend_music, or other Producer response.starts_at(required): seconds from the start of the source where extension begins. For a 60s source, use 55-58 for a clean handoff.length(optional, default 60): duration of new audio in seconds. Range 1-240.instruction(recommended): musical direction for the extension. Clearer than relying on sound/lyrics fallback.
Single extend: hook to full song
Start with the 60-second hook, then extend forward with explicit instruction:
Step 1: generate the hook
curl -X POST https://api.musicapi.ai/api/v1/producer/create \
-H "Authorization: Bearer $MUSICAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"task_type": "create_music",
"sound": "energetic synth-pop with driving bass and catchy chorus melody",
"title": "Hook",
"length": 60
}'
# Save the returned task_id, poll for completion, capture data[0].clip_id.
Step 2: extend to a chorus
curl -X POST https://api.musicapi.ai/api/v1/producer/create \
-H "Authorization: Bearer $MUSICAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"task_type": "extend_music",
"clip_id": "<hook-clip-id>",
"starts_at": 55,
"length": 60,
"instruction": "Build into a soaring chorus with layered vocal harmonies, doubled bass, and a wider arrangement"
}'
# Returns a new clip_id covering t=0 (start of extend) through t=60.
# The new clip is the continuation, not the original + continuation.
Important: the response's clip_id is for the new segment only. To play back the full song (hook + chorus continuation), you concatenate the audio_url of the original hook with the audio_url of the extend on your side.
Chaining extends past the 240-second ceiling
To produce a 3-4 minute track, chain 3-4 extends. Each call takes the previous clip's clip_id as the source. Here's a 4-minute song built in 4 calls:
# Call 1: create the hook (60s, original)
POST /api/v1/producer/create
{ "task_type": "create_music", "sound": "...", "length": 60 }
# → clip_id_1
# Call 2: extend to verse (60s)
POST /api/v1/producer/create
{ "task_type": "extend_music", "clip_id": "clip_id_1", "starts_at": 55, "length": 60,
"instruction": "Build into the first verse with intimate vocals" }
# → clip_id_2
# Call 3: extend to chorus (60s)
POST /api/v1/producer/create
{ "task_type": "extend_music", "clip_id": "clip_id_2", "starts_at": 55, "length": 60,
"instruction": "Burst into a powerful chorus, full arrangement, layered vocals" }
# → clip_id_3
# Call 4: extend to outro (60s)
POST /api/v1/producer/create
{ "task_type": "extend_music", "clip_id": "clip_id_3", "starts_at": 55, "length": 60,
"instruction": "Wind down to a gentle outro, fade out the energy gradually" }
# → clip_id_4
Total cost: 12 credits × 4 calls = 48 credits = $0.24-1.08 depending on plan. Total length: ~4 minutes (each segment is ~60 seconds, with the starts_at: 55 producing a 5-second overlap that you trim on the client).
The starts_at pattern
The single most-asked question about extend is what starts_at should be. The answer depends on what you want:
| starts_at value | Effect | When to use |
|---|---|---|
| Source duration minus 2-5s | Smooth overlap, model uses the tail of the source as context. | Default. Use this 90% of the time. |
| Source duration exactly | No overlap, less stylistic context for the model. | If you want a hard scene change (verse to bridge). |
| Source duration minus 10-20s | Larger overlap, more model context, but you'll re-render some content. | If the previous segment's tail wasn't great and you want to revise it. |
| Source duration minus 30s+ | Effectively re-rolling the second half of the source. | Rarely the right tool: use replace_music instead. |
Instruction patterns that work
- Name the song structure target. "Build into a chorus" or "Drop into a bridge" gives the model section-level context. "More music" doesn't.
- State the dynamic direction. "Energy increases" or "Wind down to outro" tells the model where the segment lands. Songs have shape; tell the model what shape.
- Reference instrumentation explicitly. "Add layered vocal harmonies and a doubled bass" gives more direction than "fuller arrangement."
- Match style to the source. If the source was indie folk, your extend instruction should reinforce that genre. Letting the model drift to a new genre mid-track produces muddled output.
- Reserve dramatic shifts for cover_music, not extend. Extend continues the song. If you want a genre transformation, use cover_music instead.
When chains drift
At 3+ chained extends, stylistic drift becomes visible. The model only sees the most recent clip as context, so each extend rebuilds its understanding of the song from a smaller window. Two mitigations:
- Anchor every extend's instruction to the original style. Phrase like "Continue the indie folk song's style" rather than just "Build into a chorus."
- Limit chain depth. 3-4 extends is the practical sweet spot. Past 5 extends, drift accumulates faster than instructions can correct it. For longer-form tracks, Suno v5 on the Sonic API has a higher per-task duration ceiling (~8 min via fewer, longer extends): see Lyria 3 Pro vs Suno v5.
Production best practices
- Cache by (clip_id + starts_at + length + instruction + seed). Same input + same seed = same output. Don't pay twice.
- Use webhooks for chained extends. If you're running 4 chained calls and polling each, you're waiting ~2 minutes of total latency. With webhooks, each completion triggers the next call automatically.
- Set a max-chain-depth in your client logic. Don't let a user trigger 20 chained extends: bound it at 4 or 5.
- Concatenate on your side. The API returns each segment as a separate audio_url. Your client stitches them. Use the starts_at overlap as a crossfade point.
- Validate each segment before chaining. Check the previous extend succeeded (state=succeeded) before submitting the next. Failed mid-chain wastes credits: auto-refund catches it but loses time.
Pricing
12 credits per extend task on MusicAPI's Producer API. Same flat cost as create, replace, and cover. Effective per-extend cost:
- $0.18 on the $5 entry pack
- $0.13 on Starter $19/mo
- $0.09 on Growth $99/mo
- $0.06 on Pro $999/mo
A 4-minute song built via 1 create + 3 extends = 48 credits = $0.72 on Growth or $0.24 on Pro. See Lyria 3 Pro pricing for the full plan economics.
Common questions
What does the extend_music endpoint do?
extend_music takes an existing AI-generated clip and continues it from a specified timestamp. You pass clip_id (the source), starts_at (the timestamp to continue from, in seconds), length (how many seconds of new audio to produce), and instruction (musical direction for the continuation). The API returns a fresh audio segment that picks up where you specified and runs forward.
Can I extend an AI song past the 240-second per-task limit?
Yes, by chaining. Each extend_music call produces a new clip with its own clip_id. Feed that new clip_id into another extend_music call to continue further. Three calls of 60-second extends gets you a 3-minute extension on top of your original 60-second source: 4 minutes total. Five chained extends gets you past 5 minutes. The practical ceiling is more about model coherence at long horizons than a hard limit.
Which clip_id do I chain on after an extend?
The clip_id returned in the extend's response, not the original. Each extend_music call produces a new clip with a new clip_id. Always chain on the most recent clip_id. The starts_at parameter on the next call is relative to that new clip, not the original.
What's the right starts_at value?
Typically the duration of the source clip minus a few seconds. If your source is 60 seconds, use starts_at: 55 or starts_at: 58. This gives the model a brief overlap to maintain stylistic continuity. Setting starts_at much earlier (e.g., starts_at: 30 on a 60s clip) gives the model permission to re-render the back half, which is usually not what you want.
How much does extend_music cost?
12 credits per extend task on MusicAPI's Producer API (Google Lyria 3 Pro). Same flat cost as create_music, replace_music, and cover_music. Failed upstream extends are auto-refunded. The cost scales linearly with the number of chains: 3 extends = 36 credits = ~$0.18-0.54 depending on plan.
Should I extend or just create a longer clip from scratch?
Depends on what you need. If you want a single coherent 3-minute song with consistent vocals, create with length: 180 (or 240 for the per-task ceiling) and one call. If you have a great 60-second hook and want to develop it further: adding a chorus, bridge, outro: extend is the right tool. Extends preserve the source's structural skeleton; create starts fresh.
Do extend chains drift stylistically?
Some drift is normal at 3+ chains. The model uses the most recent clip as context, so style migrates over many hops. Two mitigations: (1) give every extend a clear instruction that matches the source style ('continue in the same indie folk style'), and (2) limit chain depth to 3-4 in production. For longer-form tracks, consider Suno v5 which has a higher per-task duration ceiling.
Try it
- 75 free credits on signup covers a hook + 5 extends.
- Try in the playground: no signup.
- Producer API docs for the full reference.
- Cover music guide: the sibling operation. Cover changes style; extend continues the song.
Last updated 2026-05-22.