Transcript Viewer

Synchronized transcript with audio playback. Highlights each word as it's spoken, supports seeking via scrub bar, and exposes a compound-component API with provider-agnostic character-alignment data.

Installation

npx shadcn-svelte@latest add https://sv11.ui.twango.dev/r/transcript-viewer.json

Usage

<script lang="ts">
	import { TranscriptViewer } from "$lib/registry/ui/transcript-viewer";
</script>
 
<TranscriptViewer />

<script lang="ts">
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
</script>
 
<TranscriptViewer.Root {audioSrc} {alignment}>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<div class="flex items-center gap-3">
		<TranscriptViewer.PlayPauseButton />
		<TranscriptViewer.ScrubBar />
	</div>
</TranscriptViewer.Root>

Examples

Custom Audio Type

Pass audioType when the source is not MP3 so the browser picks the right decoder.

<TranscriptViewer.Root {audioSrc} {alignment} audioType="audio/wav">
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<TranscriptViewer.ScrubBar />
</TranscriptViewer.Root>

Custom Word and Gap Rendering

TranscriptViewerWords accepts renderWord and renderGap snippets for per-segment overrides. Each receives the segment and its status — "spoken", "current", or "unspoken".

<script lang="ts">
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
</script>
 
<TranscriptViewer.Root {audioSrc} {alignment}>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words>
		{#snippet renderWord({ word, status })}
			<span
				class:font-semibold={status === "current"}
				class:text-primary={status === "spoken"}
				class:text-muted-foreground={status === "unspoken"}
			>
				{word.text}
			</span>
		{/snippet}
	</TranscriptViewer.Words>
	<TranscriptViewer.ScrubBar />
</TranscriptViewer.Root>

Playback Callbacks

The root forwards the underlying <audio> lifecycle via onPlay, onPause, onTimeUpdate, onEnded, and onDurationChange — useful for analytics or syncing external state.

<script lang="ts">
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
 
	let currentTime = $state(0);
</script>
 
<TranscriptViewer.Root
	{audioSrc}
	{alignment}
	onPlay={() => console.log("Playing")}
	onPause={() => console.log("Paused")}
	onTimeUpdate={(t) => (currentTime = t)}
	onEnded={() => console.log("Ended")}
>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<TranscriptViewer.ScrubBar />
</TranscriptViewer.Root>

Custom Play/Pause Button

TranscriptViewerPlayPauseButton accepts a children snippet that receives { isPlaying }, so you can render your own label and icons while keeping the shared click behavior.

<script lang="ts">
	import PauseIcon from "@lucide/svelte/icons/pause";
	import PlayIcon from "@lucide/svelte/icons/play";
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
</script>
 
<TranscriptViewer.Root {audioSrc} {alignment}>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<TranscriptViewer.PlayPauseButton>
		{#snippet children({ isPlaying })}
			{#if isPlaying}
				<PauseIcon class="size-4" /> Pause
			{:else}
				<PlayIcon class="size-4" /> Play
			{/if}
		{/snippet}
	</TranscriptViewer.PlayPauseButton>
</TranscriptViewer.Root>

Accessing Viewer State

useTranscriptViewer() returns the shared reactive state inside any descendant — useful for custom transport UI that needs to observe currentWord, currentTime, isPlaying, or jump to a specific word via seekToWord.

<script lang="ts">
	import { useTranscriptViewer } from "$lib/registry/ui/transcript-viewer";
 
	const state = useTranscriptViewer();
</script>
 
<div>
	Current word: {state.currentWord?.text ?? "—"}
	<button onclick={() => state.seekToWord(0)}>Restart transcript</button>
</div>

API Reference

`TranscriptViewer` (root)

Prop	Type	Default	Description
`audioSrc`	`string`	—	URL of the audio file that backs the transcript.
`audioType?`	`AudioType`	`"audio/mpeg"`	MIME type emitted on the inner `<source>` element. Set this when serving non-MP3 audio so the browser picks the right decoder.
`alignment`	`CharacterAlignment`	—	Character-level alignment used to compute word boundaries and drive highlighting. Shape matches ElevenLabs' `CharacterAlignmentResponseModel`; reshape other providers' output to the same structure.
`segmentComposer?`	`SegmentComposer`	—	Override the default word/gap segmentation. Receives the raw alignment and returns the composed `segments` and `words` arrays.
`hideAudioTags?`	`boolean`	`true`	When `true`, ElevenLabs-style tags like `[excited]` are stripped from the rendered transcript.
`onPlay?`	`() => void`	—	Called when the audio starts playing.
`onPause?`	`() => void`	—	Called when the audio is paused.
`onTimeUpdate?`	`(time: number) => void`	—	Called with the current playback time (in seconds) on every audio `timeupdate`.
`onEnded?`	`() => void`	—	Called when playback reaches the end of the track.
`onDurationChange?`	`(duration: number) => void`	—	Called with the total duration (in seconds) once metadata is available.
`children?`	`Snippet`	—	Sub-components composed inside the root (e.g. `<TranscriptViewerAudio />`, `<TranscriptViewerWords />`, `<TranscriptViewerScrubBar />`).
`ref?`	`HTMLDivElement \| null`	`$bindable(null)`	Bind to the underlying wrapper `<div>` element.

Each row below also accepts the standard HTMLAttributes for its host element (e.g. class, style, data-*, event handlers) unless otherwise noted.

`TranscriptViewerAudio`

Renders the underlying <audio> element driven by the root's state. Extends Omit<HTMLAudioAttributes, "src" | "children"> — src comes from the root's audioSrc.

Prop	Type	Default	Description
`ref?`	`HTMLAudioElement \| null`	`null`	Bindable reference to the underlying `<audio>` element.

`TranscriptViewerWords`

Renders the word/gap segments. Extends HTMLAttributes<HTMLDivElement>.

Prop	Type	Default	Description
`renderWord?`	`Snippet<[{ word: TranscriptWord; status: TranscriptViewerWordStatus }]>`	—	Custom snippet for a single word. Receives the word and its current status.
`renderGap?`	`Snippet<[{ segment: GapSegment; status: TranscriptViewerWordStatus }]>`	—	Custom snippet for inter-word gaps.
`wordClassNames?`	`string`	—	Extra classes applied to each word span.
`gapClassNames?`	`string`	—	Extra classes applied to each gap span.

`TranscriptViewerWord`

The span rendered per word when no custom renderWord is supplied. Extends Omit<HTMLAttributes<HTMLSpanElement>, "children">.

Prop	Type	Default	Description
`word`	`TranscriptWord`	—	The word segment to render.
`status`	`TranscriptViewerWordStatus`	—	One of `"spoken"`, `"current"`, `"unspoken"`.
`children?`	`Snippet`	—	Override the default text with custom markup.

`TranscriptViewerPlayPauseButton`

Play/pause toggle wired to the root's isPlaying state. Extends Omit<ButtonProps, "children">, so it inherits variant, size, etc. from the button primitive.

Prop	Type	Default	Description
`children?`	`Snippet<[{ isPlaying: boolean }]>`	—	Override the default play/pause icon. Receives the live `isPlaying` flag.

`TranscriptViewerScrubBar`

Time-aware scrub bar composed over the standalone ScrubBar primitive. Extends Omit<HTMLAttributes<HTMLDivElement>, "children">.

Prop	Type	Default	Description
`showTimeLabels?`	`boolean`	`true`	Render the leading/trailing time labels.
`labelsClassName?`	`string`	—	Classes forwarded to the time-label row.
`trackClassName?`	`string`	—	Classes forwarded to the inner `ScrubBarTrack`.
`progressClassName?`	`string`	—	Classes forwarded to the `ScrubBarProgress` fill.
`thumbClassName?`	`string`	—	Classes forwarded to the `ScrubBarThumb`.

Notes

alignment shape mirrors ElevenLabs' CharacterAlignmentResponseModel — three parallel arrays (characters, characterStartTimesSeconds, characterEndTimesSeconds) indexed per character. Reshape data from other providers (OpenAI, Deepgram, custom) to the same structure.
The root composes character alignment into word and gap segments internally via composeSegments, or a custom segmentComposer when provided. Recomposition runs whenever alignment changes.
When hideAudioTags is true (default), anything inside [...] brackets — e.g. ElevenLabs' [excited] style tags — is stripped from the rendered transcript.
The audio element is owned by TranscriptViewerAudio. The root wires up play, pause, timeupdate, seeked, durationchange, and loadedmetadata listeners plus a requestAnimationFrame loop to drive currentTime and the active word index.
The active-word walk is incremental: normal playback advances via a forward scan, and seeks fall back to a binary search over the word list.
TranscriptViewerScrubBar composes the standalone ScrubBar primitive with time labels and suspends timeupdate-driven UI updates while the user is scrubbing so the thumb doesn't fight the pointer.
Words take one of three statuses — "spoken", "current", "unspoken" — which you can target via the built-in Tailwind classes on TranscriptViewerWord or render yourself through renderWord / renderGap snippets.

Speech Input Voice Button