Guide · Published 2026-05-22

How to Replace a Section of an AI Song

Swap a chorus. Fix a verse. Change vocals on a window. The replace_music endpoint with starts_at, ends_at, and instruction, explained with working code.

What "replace" means in an AI music context

Sometimes an AI-generated song is 90% right and 10% wrong. The chorus lacks the energy you wanted, or the bridge feels weak, or the vocals in the second verse drift off-style. Without the right tool, your only options are to regenerate the whole thing (losing the parts that worked) or live with the imperfection.

replace_music is the right tool. It targets a specific time window in your existing clip and re-renders only that segment with new content matching your instruction. The audio outside the window is preserved; only the [starts_at, ends_at] region changes.

Three common motivations developers use replace for:

Section-level fixes: "the chorus needs more energy" or "the bridge feels weak" without rebuilding the whole track.
Vocal swaps: change vocal style, swap lyrics, or replace a voice on a specific verse without re-rendering the instrumental backing.
Instrumental swaps: keep the vocals, change the arrangement underneath them for a specific section.

The replace_music endpoint at a glance

On MusicAPI's Producer API (Google Lyria 3 Pro), the endpoint is POST /api/v1/producer/create with task_type: "replace_music". The four parameters that matter:

clip_id (required): the source clip containing the segment you want to replace.
starts_at (required): start of the replacement window, in seconds from the beginning of the source.
ends_at (required): end of the replacement window. Must be greater than starts_at.
instruction (recommended): what to put in that window. Be specific.

Use case: fix a weak chorus

Your hook is great. The verse works. The chorus underwhelms. Target just the chorus window and re-render:

curl -X POST https://api.musicapi.ai/api/v1/producer/create \
  -H "Authorization: Bearer $MUSICAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "replace_music",
    "clip_id": "<source-clip-id>",
    "starts_at": 30,
    "ends_at": 50,
    "instruction": "Replace this chorus with a more energetic, layered version. Add backing vocals, doubled bass, a wider arrangement. Keep the melodic hook intact but make it bigger."
  }'

The result is a new clip with the chorus replaced. The intro, verse, and outro outside [30, 50] are preserved.

Use case: swap vocals on a verse (replaces swap_music_vocals)

The legacy swap_music_vocals operation was retired in the 2026-04 model platform migration. The migration path is replace_music targeting the vocal window with a vocal-focused instruction:

curl -X POST https://api.musicapi.ai/api/v1/producer/create \
  -H "Authorization: Bearer $MUSICAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "replace_music",
    "clip_id": "<source-clip-id>",
    "starts_at": 60,
    "ends_at": 90,
    "instruction": "Replace the vocal performance in this window with a smooth male R&B vocal with subtle autotune. Keep the instrumental groove and chord progression exactly the same. Lyrics: When I see you smile, time stands still, every moment with you is a perfect thrill.",
    "lyrics": "[Verse 2]\nWhen I see you smile\nTime stands still\nEvery moment with you\nIs a perfect thrill"
  }'

The model uses the instruction + the optional lyrics field to drive the vocal re-render. The instrumental underneath stays as close as the model can preserve.

Use case: swap instrumental backing (replaces swap_music_sound)

Same pattern, inverted: keep the vocals (the model has them as reference), change the instrumental backing.

curl -X POST https://api.musicapi.ai/api/v1/producer/create \
  -H "Authorization: Bearer $MUSICAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "replace_music",
    "clip_id": "<source-clip-id>",
    "starts_at": 0,
    "ends_at": 60,
    "instruction": "Re-arrange this section with ambient atmospheric pads, soft piano melodies, gentle reverb, and minimalist production. Keep the vocals identical in character and lyric content."
  }'

Picking the right time window

starts_at and ends_at define the segment to replace. Three heuristics:

Window size	What it captures	When to use
5-10 seconds	A specific phrase, fill, or moment.	Fixing a single bad moment. Be precise about what to change in the instruction.
15-30 seconds	A song section (verse, chorus, bridge).	The sweet spot for most production-level fixes.
30-60 seconds	A larger arrangement chunk or a full verse-plus-chorus.	When you want to re-think a meaningful portion of the song.
Whole song length	Equivalent to cover_music with strength 0.6+.	Don't. Use cover_music if you want the whole song reinterpreted.

Instruction patterns that work

State what to keep, not just what to change. "Replace the chorus arrangement but keep the vocal melody and chord progression" gives the model an anchor.
Be specific about new content. "Add backing vocals, doubled bass, and a wider arrangement" outperforms "make it bigger."
For vocal swaps, name the vocal style. "Smooth male R&B with subtle autotune" tells the model exactly what register to aim for.
For instrumental swaps, name the instruments explicitly. "Ambient pads, soft piano, gentle reverb" beats "more chill."
Pass new lyrics when replacing a vocal segment. The lyrics field can carry the new lyric content alongside the instruction.

Production best practices

Validate the time window is inside the source clip. ends_at must be less than the source clip's duration. Submitting ends_at past the end produces undefined behavior.
Don't chain replaces on the same window repeatedly. Each replace re-renders. Iteration drift accumulates. If your first replace doesn't land, change the instruction substantially before trying again rather than tweaking small parameters.
Use seed for A/B tests. Same input + same seed produces the same output. Useful when you're tuning instruction wording without confounding from generation randomness.
Combine with extend_music for full restructuring. Replace fixes a section; extend continues forward. Together they can rebuild meaningful portions of a song without rebuilding from scratch.
Cache by (clip_id + starts_at + ends_at + instruction + seed). Don't pay twice for the same render.

Pricing

12 credits per replace task on MusicAPI's Producer API. Same flat cost as create, extend, and cover. Effective per-replace cost:

$0.18 on the $5 entry pack
$0.13 on Starter $19/mo
$0.09 on Growth $99/mo
$0.06 on Pro $999/mo

See Lyria 3 Pro pricing for the full plan economics.

Common questions

What does the replace_music endpoint do?

replace_music swaps a specific time window of an existing audio clip with new content matching your instruction. You pass clip_id (the source), starts_at (window start in seconds), ends_at (window end), and instruction (what to put in that window). The model preserves the audio outside the [starts_at, ends_at] window and re-renders only the targeted segment.

What's the difference between extend_music and replace_music?

Extend continues a song forward from a timestamp, producing new audio after the source. Replace targets a specific time window inside the source and swaps that segment with new content. Use extend to make a song longer; use replace to fix or change a part of a song you already have.

Can I use replace_music to swap vocals?

Yes. Target the time window where the vocals are present (starts_at and ends_at marking the vocal section), and instruct the model to re-render with different vocal style or new lyrics. This is the recommended migration path from the legacy swap_music_vocals operation that was retired in the 2026-04 model platform migration.

Can I use replace_music to swap instrumental backing?

Yes. Same pattern: target the window and instruct the model to re-render with a different instrumental arrangement while keeping the vocal content the model has reference to. This replaces the legacy swap_music_sound operation.

How much does replace_music cost?

12 credits per replace task on MusicAPI's Producer API. Same flat cost as create_music, extend_music, and cover_music. Failed upstream replaces are auto-refunded. Effective cost ranges from $0.06-0.18 per replace depending on plan.

What's the smallest time window I can target?

Practically, around 5 seconds. Below that the model has too little context to produce coherent output. The endpoint accepts any starts_at/ends_at pair where ends_at > starts_at, but you'll get muddled output below 5 seconds. The sweet spot is 10-30 seconds for chorus/verse-level edits.

Will the audio outside my window stay exactly the same?

The model preserves the audio content outside the window but the boundary handling means you may hear a very brief crossfade at starts_at and ends_at. For most production use this is invisible. If you need bit-perfect preservation of the unmodified regions, do client-side splicing: send the original audio + the replace output to your audio toolchain and splice precisely at your chosen timestamps.

Try it

75 free credits on signup covers 6 replaces.
Try in the playground.
Producer API docs.
Extend music guide: for going forward instead of replacing.
Cover music guide: for whole-song reinterpretation.

Last updated 2026-05-22.