Guide · Published 2026-05-22

How to Extend AI Music Clips with an API

Chain extend_music calls to take a 60-second AI clip past the 240s per-task ceiling. starts_at, length, instruction, and the chaining pattern explained with working code.


What "extend" means in an AI music context

AI music models have a per-task duration ceiling. On Google Lyria 3 Pro (the model behind MusicAPI's Producer API), a single create call can produce up to 240 seconds of audio. Beyond that, you need to extend.

The extend_music operation takes an existing clip and continues it forward from a specified timestamp. The result is a new audio segment that picks up where you tell it and runs for the length you request. Chain multiple extends to build full-length tracks from shorter clips.

Three common motivations developers use extend for:

  1. Hook-to-full-song development: generate a great 60-second hook, then extend with instructions like "build to a chorus" or "add a bridge and outro."
  2. Per-section iteration: generate verse 1, extend to chorus 1, extend to verse 2, etc. Each section is a separate extend with its own instruction.
  3. Reaching longer durations: chain 3-5 extends to produce 4+ minute tracks past the 240s single-task ceiling.

The extend_music endpoint at a glance

On MusicAPI's Producer API (Google Lyria 3 Pro), the endpoint is POST /api/v1/producer/create with task_type: "extend_music". The four parameters that matter:

  • clip_id (required): the source clip to extend. From a previous create_music, extend_music, or other Producer response.
  • starts_at (required): seconds from the start of the source where extension begins. For a 60s source, use 55-58 for a clean handoff.
  • length (optional, default 60): duration of new audio in seconds. Range 1-240.
  • instruction (recommended): musical direction for the extension. Clearer than relying on sound/lyrics fallback.

Single extend: hook to full song

Start with the 60-second hook, then extend forward with explicit instruction:

Step 1: generate the hook

curl -X POST https://api.musicapi.ai/api/v1/producer/create \
  -H "Authorization: Bearer $MUSICAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "create_music",
    "sound": "energetic synth-pop with driving bass and catchy chorus melody",
    "title": "Hook",
    "length": 60
  }'

# Save the returned task_id, poll for completion, capture data[0].clip_id.

Step 2: extend to a chorus

curl -X POST https://api.musicapi.ai/api/v1/producer/create \
  -H "Authorization: Bearer $MUSICAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "extend_music",
    "clip_id": "<hook-clip-id>",
    "starts_at": 55,
    "length": 60,
    "instruction": "Build into a soaring chorus with layered vocal harmonies, doubled bass, and a wider arrangement"
  }'

# Returns a new clip_id covering t=0 (start of extend) through t=60.
# The new clip is the continuation, not the original + continuation.

Important: the response's clip_id is for the new segment only. To play back the full song (hook + chorus continuation), you concatenate the audio_url of the original hook with the audio_url of the extend on your side.

Chaining extends past the 240-second ceiling

To produce a 3-4 minute track, chain 3-4 extends. Each call takes the previous clip's clip_id as the source. Here's a 4-minute song built in 4 calls:

# Call 1: create the hook (60s, original)
POST /api/v1/producer/create
{ "task_type": "create_music", "sound": "...", "length": 60 }
# → clip_id_1

# Call 2: extend to verse (60s)
POST /api/v1/producer/create
{ "task_type": "extend_music", "clip_id": "clip_id_1", "starts_at": 55, "length": 60,
  "instruction": "Build into the first verse with intimate vocals" }
# → clip_id_2

# Call 3: extend to chorus (60s)
POST /api/v1/producer/create
{ "task_type": "extend_music", "clip_id": "clip_id_2", "starts_at": 55, "length": 60,
  "instruction": "Burst into a powerful chorus, full arrangement, layered vocals" }
# → clip_id_3

# Call 4: extend to outro (60s)
POST /api/v1/producer/create
{ "task_type": "extend_music", "clip_id": "clip_id_3", "starts_at": 55, "length": 60,
  "instruction": "Wind down to a gentle outro, fade out the energy gradually" }
# → clip_id_4

Total cost: 12 credits × 4 calls = 48 credits = $0.24-1.08 depending on plan. Total length: ~4 minutes (each segment is ~60 seconds, with the starts_at: 55 producing a 5-second overlap that you trim on the client).

The starts_at pattern

The single most-asked question about extend is what starts_at should be. The answer depends on what you want:

starts_at valueEffectWhen to use
Source duration minus 2-5sSmooth overlap, model uses the tail of the source as context.Default. Use this 90% of the time.
Source duration exactlyNo overlap, less stylistic context for the model.If you want a hard scene change (verse to bridge).
Source duration minus 10-20sLarger overlap, more model context, but you'll re-render some content.If the previous segment's tail wasn't great and you want to revise it.
Source duration minus 30s+Effectively re-rolling the second half of the source.Rarely the right tool: use replace_music instead.

Instruction patterns that work

  • Name the song structure target. "Build into a chorus" or "Drop into a bridge" gives the model section-level context. "More music" doesn't.
  • State the dynamic direction. "Energy increases" or "Wind down to outro" tells the model where the segment lands. Songs have shape; tell the model what shape.
  • Reference instrumentation explicitly. "Add layered vocal harmonies and a doubled bass" gives more direction than "fuller arrangement."
  • Match style to the source. If the source was indie folk, your extend instruction should reinforce that genre. Letting the model drift to a new genre mid-track produces muddled output.
  • Reserve dramatic shifts for cover_music, not extend. Extend continues the song. If you want a genre transformation, use cover_music instead.

When chains drift

At 3+ chained extends, stylistic drift becomes visible. The model only sees the most recent clip as context, so each extend rebuilds its understanding of the song from a smaller window. Two mitigations:

  1. Anchor every extend's instruction to the original style. Phrase like "Continue the indie folk song's style" rather than just "Build into a chorus."
  2. Limit chain depth. 3-4 extends is the practical sweet spot. Past 5 extends, drift accumulates faster than instructions can correct it. For longer-form tracks, Suno v5 on the Sonic API has a higher per-task duration ceiling (~8 min via fewer, longer extends): see Lyria 3 Pro vs Suno v5.

Production best practices

  1. Cache by (clip_id + starts_at + length + instruction + seed). Same input + same seed = same output. Don't pay twice.
  2. Use webhooks for chained extends. If you're running 4 chained calls and polling each, you're waiting ~2 minutes of total latency. With webhooks, each completion triggers the next call automatically.
  3. Set a max-chain-depth in your client logic. Don't let a user trigger 20 chained extends: bound it at 4 or 5.
  4. Concatenate on your side. The API returns each segment as a separate audio_url. Your client stitches them. Use the starts_at overlap as a crossfade point.
  5. Validate each segment before chaining. Check the previous extend succeeded (state=succeeded) before submitting the next. Failed mid-chain wastes credits: auto-refund catches it but loses time.

Pricing

12 credits per extend task on MusicAPI's Producer API. Same flat cost as create, replace, and cover. Effective per-extend cost:

  • $0.18 on the $5 entry pack
  • $0.13 on Starter $19/mo
  • $0.09 on Growth $99/mo
  • $0.06 on Pro $999/mo

A 4-minute song built via 1 create + 3 extends = 48 credits = $0.72 on Growth or $0.24 on Pro. See Lyria 3 Pro pricing for the full plan economics.

Common questions

What does the extend_music endpoint do?

extend_music takes an existing AI-generated clip and continues it from a specified timestamp. You pass clip_id (the source), starts_at (the timestamp to continue from, in seconds), length (how many seconds of new audio to produce), and instruction (musical direction for the continuation). The API returns a fresh audio segment that picks up where you specified and runs forward.

Can I extend an AI song past the 240-second per-task limit?

Yes, by chaining. Each extend_music call produces a new clip with its own clip_id. Feed that new clip_id into another extend_music call to continue further. Three calls of 60-second extends gets you a 3-minute extension on top of your original 60-second source: 4 minutes total. Five chained extends gets you past 5 minutes. The practical ceiling is more about model coherence at long horizons than a hard limit.

Which clip_id do I chain on after an extend?

The clip_id returned in the extend's response, not the original. Each extend_music call produces a new clip with a new clip_id. Always chain on the most recent clip_id. The starts_at parameter on the next call is relative to that new clip, not the original.

What's the right starts_at value?

Typically the duration of the source clip minus a few seconds. If your source is 60 seconds, use starts_at: 55 or starts_at: 58. This gives the model a brief overlap to maintain stylistic continuity. Setting starts_at much earlier (e.g., starts_at: 30 on a 60s clip) gives the model permission to re-render the back half, which is usually not what you want.

How much does extend_music cost?

12 credits per extend task on MusicAPI's Producer API (Google Lyria 3 Pro). Same flat cost as create_music, replace_music, and cover_music. Failed upstream extends are auto-refunded. The cost scales linearly with the number of chains: 3 extends = 36 credits = ~$0.18-0.54 depending on plan.

Should I extend or just create a longer clip from scratch?

Depends on what you need. If you want a single coherent 3-minute song with consistent vocals, create with length: 180 (or 240 for the per-task ceiling) and one call. If you have a great 60-second hook and want to develop it further: adding a chorus, bridge, outro: extend is the right tool. Extends preserve the source's structural skeleton; create starts fresh.

Do extend chains drift stylistically?

Some drift is normal at 3+ chains. The model uses the most recent clip as context, so style migrates over many hops. Two mitigations: (1) give every extend a clear instruction that matches the source style ('continue in the same indie folk style'), and (2) limit chain depth to 3-4 in production. For longer-form tracks, consider Suno v5 which has a higher per-task duration ceiling.

Try it

Last updated 2026-05-22.