GitHub 0

Transcript Viewer

Previous Next

Synchronized transcript with audio playback. Highlights each word as it's spoken, supports seeking via scrub bar, and exposes a compound-component API with provider-agnostic character-alignment data.

Installation

npx shadcn-svelte@latest add https://sv11.ui.twango.dev/r/transcript-viewer.json

Usage

<script lang="ts">
	import { TranscriptViewer } from "$lib/registry/ui/transcript-viewer";
</script>
 
<TranscriptViewer />
<script lang="ts">
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
</script>
 
<TranscriptViewer.Root {audioSrc} {alignment}>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<div class="flex items-center gap-3">
		<TranscriptViewer.PlayPauseButton />
		<TranscriptViewer.ScrubBar />
	</div>
</TranscriptViewer.Root>

Examples

Custom Audio Type

Pass audioType when the source is not MP3 so the browser picks the right decoder.

<TranscriptViewer.Root {audioSrc} {alignment} audioType="audio/wav">
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<TranscriptViewer.ScrubBar />
</TranscriptViewer.Root>

Custom Word and Gap Rendering

TranscriptViewerWords accepts renderWord and renderGap snippets for per-segment overrides. Each receives the segment and its status"spoken", "current", or "unspoken".

<script lang="ts">
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
</script>
 
<TranscriptViewer.Root {audioSrc} {alignment}>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words>
		{#snippet renderWord({ word, status })}
			<span
				class:font-semibold={status === "current"}
				class:text-primary={status === "spoken"}
				class:text-muted-foreground={status === "unspoken"}
			>
				{word.text}
			</span>
		{/snippet}
	</TranscriptViewer.Words>
	<TranscriptViewer.ScrubBar />
</TranscriptViewer.Root>

Playback Callbacks

The root forwards the underlying <audio> lifecycle via onPlay, onPause, onTimeUpdate, onEnded, and onDurationChange — useful for analytics or syncing external state.

<script lang="ts">
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
 
	let currentTime = $state(0);
</script>
 
<TranscriptViewer.Root
	{audioSrc}
	{alignment}
	onPlay={() => console.log("Playing")}
	onPause={() => console.log("Paused")}
	onTimeUpdate={(t) => (currentTime = t)}
	onEnded={() => console.log("Ended")}
>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<TranscriptViewer.ScrubBar />
</TranscriptViewer.Root>

Custom Play/Pause Button

TranscriptViewerPlayPauseButton accepts a children snippet that receives { isPlaying }, so you can render your own label and icons while keeping the shared click behavior.

<script lang="ts">
	import PauseIcon from "@lucide/svelte/icons/pause";
	import PlayIcon from "@lucide/svelte/icons/play";
	import * as TranscriptViewer from "$lib/registry/ui/transcript-viewer";
	import type { CharacterAlignment } from "$lib/registry/ui/transcript-viewer";
 
	let {
		audioSrc,
		alignment,
	}: {
		audioSrc: string;
		alignment: CharacterAlignment;
	} = $props();
</script>
 
<TranscriptViewer.Root {audioSrc} {alignment}>
	<TranscriptViewer.Audio />
	<TranscriptViewer.Words />
	<TranscriptViewer.PlayPauseButton>
		{#snippet children({ isPlaying })}
			{#if isPlaying}
				<PauseIcon class="size-4" /> Pause
			{:else}
				<PlayIcon class="size-4" /> Play
			{/if}
		{/snippet}
	</TranscriptViewer.PlayPauseButton>
</TranscriptViewer.Root>

Accessing Viewer State

useTranscriptViewer() returns the shared reactive state inside any descendant — useful for custom transport UI that needs to observe currentWord, currentTime, isPlaying, or jump to a specific word via seekToWord.

<script lang="ts">
	import { useTranscriptViewer } from "$lib/registry/ui/transcript-viewer";
 
	const state = useTranscriptViewer();
</script>
 
<div>
	Current word: {state.currentWord?.text ?? "—"}
	<button onclick={() => state.seekToWord(0)}>Restart transcript</button>
</div>

API Reference

TranscriptViewer (root)

Prop Type Default Description
audioSrc string URL of the audio file that backs the transcript.
audioType? AudioType "audio/mpeg" MIME type emitted on the inner &lt;source&gt; element. Set this when serving non-MP3 audio so the browser picks the right decoder.
alignment CharacterAlignment Character-level alignment used to compute word boundaries and drive highlighting. Shape matches ElevenLabs' CharacterAlignmentResponseModel; reshape other providers' output to the same structure.
segmentComposer? SegmentComposer Override the default word/gap segmentation. Receives the raw alignment and returns the composed segments and words arrays.
hideAudioTags? boolean true When true, ElevenLabs-style tags like [excited] are stripped from the rendered transcript.
onPlay? () => void Called when the audio starts playing.
onPause? () => void Called when the audio is paused.
onTimeUpdate? (time: number) => void Called with the current playback time (in seconds) on every audio timeupdate.
onEnded? () => void Called when playback reaches the end of the track.
onDurationChange? (duration: number) => void Called with the total duration (in seconds) once metadata is available.
children? Snippet Sub-components composed inside the root (e.g. &lt;TranscriptViewerAudio /&gt;, &lt;TranscriptViewerWords /&gt;, &lt;TranscriptViewerScrubBar /&gt;).
ref? HTMLDivElement | null $bindable(null) Bind to the underlying wrapper &lt;div&gt; element.

Each row below also accepts the standard HTMLAttributes for its host element (e.g. class, style, data-*, event handlers) unless otherwise noted.

TranscriptViewerAudio

Renders the underlying <audio> element driven by the root's state. Extends Omit<HTMLAudioAttributes, "src" | "children">src comes from the root's audioSrc.

Prop Type Default Description
ref? HTMLAudioElement | null null Bindable reference to the underlying <audio> element.

TranscriptViewerWords

Renders the word/gap segments. Extends HTMLAttributes<HTMLDivElement>.

Prop Type Default Description
renderWord? Snippet<[{ word: TranscriptWord; status: TranscriptViewerWordStatus }]> Custom snippet for a single word. Receives the word and its current status.
renderGap? Snippet<[{ segment: GapSegment; status: TranscriptViewerWordStatus }]> Custom snippet for inter-word gaps.
wordClassNames? string Extra classes applied to each word span.
gapClassNames? string Extra classes applied to each gap span.

TranscriptViewerWord

The span rendered per word when no custom renderWord is supplied. Extends Omit<HTMLAttributes<HTMLSpanElement>, "children">.

Prop Type Default Description
word TranscriptWord The word segment to render.
status TranscriptViewerWordStatus One of "spoken", "current", "unspoken".
children? Snippet Override the default text with custom markup.

TranscriptViewerPlayPauseButton

Play/pause toggle wired to the root's isPlaying state. Extends Omit<ButtonProps, "children">, so it inherits variant, size, etc. from the button primitive.

Prop Type Default Description
children? Snippet<[{ isPlaying: boolean }]> Override the default play/pause icon. Receives the live isPlaying flag.

TranscriptViewerScrubBar

Time-aware scrub bar composed over the standalone ScrubBar primitive. Extends Omit<HTMLAttributes<HTMLDivElement>, "children">.

Prop Type Default Description
showTimeLabels? boolean true Render the leading/trailing time labels.
labelsClassName? string Classes forwarded to the time-label row.
trackClassName? string Classes forwarded to the inner ScrubBarTrack.
progressClassName? string Classes forwarded to the ScrubBarProgress fill.
thumbClassName? string Classes forwarded to the ScrubBarThumb.

Notes

  • alignment shape mirrors ElevenLabs' CharacterAlignmentResponseModel — three parallel arrays (characters, characterStartTimesSeconds, characterEndTimesSeconds) indexed per character. Reshape data from other providers (OpenAI, Deepgram, custom) to the same structure.
  • The root composes character alignment into word and gap segments internally via composeSegments, or a custom segmentComposer when provided. Recomposition runs whenever alignment changes.
  • When hideAudioTags is true (default), anything inside [...] brackets — e.g. ElevenLabs' [excited] style tags — is stripped from the rendered transcript.
  • The audio element is owned by TranscriptViewerAudio. The root wires up play, pause, timeupdate, seeked, durationchange, and loadedmetadata listeners plus a requestAnimationFrame loop to drive currentTime and the active word index.
  • The active-word walk is incremental: normal playback advances via a forward scan, and seeks fall back to a binary search over the word list.
  • TranscriptViewerScrubBar composes the standalone ScrubBar primitive with time labels and suspends timeupdate-driven UI updates while the user is scrubbing so the thumb doesn't fight the pointer.
  • Words take one of three statuses — "spoken", "current", "unspoken" — which you can target via the built-in Tailwind classes on TranscriptViewerWord or render yourself through renderWord / renderGap snippets.