Use case

Lyric Alignment API

Developers building karaoke, lyric-video, or caption features who need lyrics timed to the audio, not just the lyric text.

The problem

A track plus its lyrics text is not enough to highlight words in time or render a synced lyric video. Aligning each word to the audio yourself means running a forced-alignment model, tuning it per track, and hosting the inference.

Why MusicAPI fits this

One call, then free: POST a clip_id and get back a timestamped alignment array. The first alignment of a clip costs 1 credit; repeat requests for the same clip are served from cache at no charge.

Ready-to-render timeline: the response maps lyric segments to start and stop times, so you can drive karaoke-style word highlighting, lyric videos, or caption tracks directly.

No model to host: forced alignment runs on our side and you receive plain JSON — no GPU, model weights, or alignment pipeline to operate.

Code sample

A real request against the live API: start a job, then poll the task endpoint until the audio is ready.

curl
# Get the word/line alignment timeline for a finished clip
curl -X POST https://api.musicapi.ai/api/v1/sonic/aligned-lyrics \
  -H "Authorization: Bearer $MUSICAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"clip_id": "YOUR_CLIP_ID"}'

# -> { "code": 200, "message": "success",
#      "data": { "alignment": [ { /* timed lyric segments */ } ] } }

Pricing

MusicAPI is pay-as-you-go with credit packs, plus predictable monthly subscriptions. The per-credit rate is the same across packs and subscriptions. See the pricing page for current rates, free credits, and volume options.

Related: Suno API · Producer AI API

FAQ

What does the alignment contain?

A timeline array that maps segments of the lyrics to their start and stop times in the track. It is structured for word- or line-level synchronization, so you can highlight lyrics as the song plays or build a lyric video.

How much does it cost?

One credit the first time you align a given clip. The result is cached, so every later request for the same clip_id is returned for free.

Do I need to supply the lyrics?

No. The alignment is derived from a generated clip that already has lyrics — pass its clip_id. A clip with no lyrics returns a not-found response.

Build it in 5 minutes

Get free credits on signup and run real generations before any payment. No credit card required to start.

API details verified 2026-06-07. The API surface evolves; the pricing page always has current rates.