How it works

How a clip gets made.

BitterClip turns a recording into clips you can trust, in three nouns: Recording → Moment → Clip. Your AI selects the words; BitterClip derives the cut from the audio. No timeline, no guessed timestamps.

01

Recording

You bring a video or audio file. BitterClip transcribes it to time-aligned words — and that transcript is the canonical timing signal for everything downstream.

02

Moment

Your AI reads the transcript and proposes moments worth clipping — by selecting words, not timestamps. BitterClip derives the start and end from the first and last word it picked, plus a little breathing room, snapped to the audio. The AI never invents a time code, so a clip never lands mid-syllable.

03

Clip

You confirm or nudge the edges, and BitterClip renders a captioned MP4 with the transcript and an SRT. Every clip stays linked to its recording, speaker, exact words, and timestamp — so you can always trace it back.

Media custody

Your recordings and rendered clips live in your BitterClip workspace. Each clip carries a receipt — recording, speaker, words, timestamp — so the chain from source to share is never broken.

For developers

BitterClip is a remote MCP server. The tools your AI calls, and the bridge the editor runs on, are documented on the MCP page.