Guide · Published 2026-05-22
How to Replace a Section of an AI Song
Swap a chorus. Fix a verse. Change vocals on a window. The replace_music endpoint with starts_at, ends_at, and instruction, explained with working code.
What "replace" means in an AI music context
Sometimes an AI-generated song is 90% right and 10% wrong. The chorus lacks the energy you wanted, or the bridge feels weak, or the vocals in the second verse drift off-style. Without the right tool, your only options are to regenerate the whole thing (losing the parts that worked) or live with the imperfection.
replace_music is the right tool. It targets a specific time window in your existing clip and re-renders only that segment with new content matching your instruction. The audio outside the window is preserved; only the [starts_at, ends_at] region changes.
Three common motivations developers use replace for:
- Section-level fixes: "the chorus needs more energy" or "the bridge feels weak" without rebuilding the whole track.
- Vocal swaps: change vocal style, swap lyrics, or replace a voice on a specific verse without re-rendering the instrumental backing.
- Instrumental swaps: keep the vocals, change the arrangement underneath them for a specific section.
The replace_music endpoint at a glance
On MusicAPI's Producer API (Google Lyria 3 Pro), the endpoint is POST /api/v1/producer/create with task_type: "replace_music". The four parameters that matter:
clip_id(required): the source clip containing the segment you want to replace.starts_at(required): start of the replacement window, in seconds from the beginning of the source.ends_at(required): end of the replacement window. Must be greater than starts_at.instruction(recommended): what to put in that window. Be specific.
Use case: fix a weak chorus
Your hook is great. The verse works. The chorus underwhelms. Target just the chorus window and re-render:
curl -X POST https://api.musicapi.ai/api/v1/producer/create \
-H "Authorization: Bearer $MUSICAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"task_type": "replace_music",
"clip_id": "<source-clip-id>",
"starts_at": 30,
"ends_at": 50,
"instruction": "Replace this chorus with a more energetic, layered version. Add backing vocals, doubled bass, a wider arrangement. Keep the melodic hook intact but make it bigger."
}'
The result is a new clip with the chorus replaced. The intro, verse, and outro outside [30, 50] are preserved.
Use case: swap vocals on a verse (replaces swap_music_vocals)
The legacy swap_music_vocals operation was retired in the 2026-04 model platform migration. The migration path is replace_music targeting the vocal window with a vocal-focused instruction:
curl -X POST https://api.musicapi.ai/api/v1/producer/create \
-H "Authorization: Bearer $MUSICAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"task_type": "replace_music",
"clip_id": "<source-clip-id>",
"starts_at": 60,
"ends_at": 90,
"instruction": "Replace the vocal performance in this window with a smooth male R&B vocal with subtle autotune. Keep the instrumental groove and chord progression exactly the same. Lyrics: When I see you smile, time stands still, every moment with you is a perfect thrill.",
"lyrics": "[Verse 2]\nWhen I see you smile\nTime stands still\nEvery moment with you\nIs a perfect thrill"
}'
The model uses the instruction + the optional lyrics field to drive the vocal re-render. The instrumental underneath stays as close as the model can preserve.
Use case: swap instrumental backing (replaces swap_music_sound)
Same pattern, inverted: keep the vocals (the model has them as reference), change the instrumental backing.
curl -X POST https://api.musicapi.ai/api/v1/producer/create \
-H "Authorization: Bearer $MUSICAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"task_type": "replace_music",
"clip_id": "<source-clip-id>",
"starts_at": 0,
"ends_at": 60,
"instruction": "Re-arrange this section with ambient atmospheric pads, soft piano melodies, gentle reverb, and minimalist production. Keep the vocals identical in character and lyric content."
}'
Picking the right time window
starts_at and ends_at define the segment to replace. Three heuristics:
| Window size | What it captures | When to use |
|---|---|---|
| 5-10 seconds | A specific phrase, fill, or moment. | Fixing a single bad moment. Be precise about what to change in the instruction. |
| 15-30 seconds | A song section (verse, chorus, bridge). | The sweet spot for most production-level fixes. |
| 30-60 seconds | A larger arrangement chunk or a full verse-plus-chorus. | When you want to re-think a meaningful portion of the song. |
| Whole song length | Equivalent to cover_music with strength 0.6+. | Don't. Use cover_music if you want the whole song reinterpreted. |
Instruction patterns that work
- State what to keep, not just what to change. "Replace the chorus arrangement but keep the vocal melody and chord progression" gives the model an anchor.
- Be specific about new content. "Add backing vocals, doubled bass, and a wider arrangement" outperforms "make it bigger."
- For vocal swaps, name the vocal style. "Smooth male R&B with subtle autotune" tells the model exactly what register to aim for.
- For instrumental swaps, name the instruments explicitly. "Ambient pads, soft piano, gentle reverb" beats "more chill."
- Pass new lyrics when replacing a vocal segment. The
lyricsfield can carry the new lyric content alongside the instruction.
Production best practices
- Validate the time window is inside the source clip. ends_at must be less than the source clip's duration. Submitting ends_at past the end produces undefined behavior.
- Don't chain replaces on the same window repeatedly. Each replace re-renders. Iteration drift accumulates. If your first replace doesn't land, change the instruction substantially before trying again rather than tweaking small parameters.
- Use seed for A/B tests. Same input + same seed produces the same output. Useful when you're tuning instruction wording without confounding from generation randomness.
- Combine with extend_music for full restructuring. Replace fixes a section; extend continues forward. Together they can rebuild meaningful portions of a song without rebuilding from scratch.
- Cache by (clip_id + starts_at + ends_at + instruction + seed). Don't pay twice for the same render.
Pricing
12 credits per replace task on MusicAPI's Producer API. Same flat cost as create, extend, and cover. Effective per-replace cost:
- $0.18 on the $5 entry pack
- $0.13 on Starter $19/mo
- $0.09 on Growth $99/mo
- $0.06 on Pro $999/mo
See Lyria 3 Pro pricing for the full plan economics.
Common questions
What does the replace_music endpoint do?
replace_music swaps a specific time window of an existing audio clip with new content matching your instruction. You pass clip_id (the source), starts_at (window start in seconds), ends_at (window end), and instruction (what to put in that window). The model preserves the audio outside the [starts_at, ends_at] window and re-renders only the targeted segment.
What's the difference between extend_music and replace_music?
Extend continues a song forward from a timestamp, producing new audio after the source. Replace targets a specific time window inside the source and swaps that segment with new content. Use extend to make a song longer; use replace to fix or change a part of a song you already have.
Can I use replace_music to swap vocals?
Yes. Target the time window where the vocals are present (starts_at and ends_at marking the vocal section), and instruct the model to re-render with different vocal style or new lyrics. This is the recommended migration path from the legacy swap_music_vocals operation that was retired in the 2026-04 model platform migration.
Can I use replace_music to swap instrumental backing?
Yes. Same pattern: target the window and instruct the model to re-render with a different instrumental arrangement while keeping the vocal content the model has reference to. This replaces the legacy swap_music_sound operation.
How much does replace_music cost?
12 credits per replace task on MusicAPI's Producer API. Same flat cost as create_music, extend_music, and cover_music. Failed upstream replaces are auto-refunded. Effective cost ranges from $0.06-0.18 per replace depending on plan.
What's the smallest time window I can target?
Practically, around 5 seconds. Below that the model has too little context to produce coherent output. The endpoint accepts any starts_at/ends_at pair where ends_at > starts_at, but you'll get muddled output below 5 seconds. The sweet spot is 10-30 seconds for chorus/verse-level edits.
Will the audio outside my window stay exactly the same?
The model preserves the audio content outside the window but the boundary handling means you may hear a very brief crossfade at starts_at and ends_at. For most production use this is invisible. If you need bit-perfect preservation of the unmodified regions, do client-side splicing: send the original audio + the replace output to your audio toolchain and splice precisely at your chosen timestamps.
Try it
- 75 free credits on signup covers 6 replaces.
- Try in the playground.
- Producer API docs.
- Extend music guide: for going forward instead of replacing.
- Cover music guide: for whole-song reinterpretation.
Last updated 2026-05-22.