The browser-based subtitle-tool market is crowded. Veed, Kapwing, Clideo, Submagic, HappyScribe — and a long tail of smaller services. None of them are bad. But all of them share the same compromise: you upload your video to their server, you wait, you may get a watermark, and at some point a paywall appears.

BurnSub is built on the opposite trade-off. The video never leaves your device. The captions are burned in locally, using your machine’s hardware. There is no upload, no watermark, no signup, and the entire tool is free because the architecture removes the costs that would justify a subscription.

This post is the case for that approach — what it costs, what it gains, and why the modern browser makes it possible in a way that was not realistic a few years ago.

The problem with “upload to server” tools

The dominant pattern in browser-based video tools looks like this:

User selects a video file in the browser.
The browser uploads the file to the tool’s server.
The server processes it (encodes, captions, adds watermark).
The server returns the result to the browser for download.

Every step in that pipeline has a cost.

The upload step scales linearly with file size. A 30-second 1080p clip might be 50–100 MB. A 5-minute 4K clip can be 500 MB or more. Even on a fast connection, a 500 MB upload is a 1–5 minute wait. On a typical home connection it is much longer. For a quick task, the wait is the dominant part of the experience.

The server step costs the tool operator real money. Encoding video on a server farm is bandwidth, storage, and compute. A free tier that processed everyone’s full-resolution video would bankrupt the operator. That is why almost every “free” tier in this category has caps: file size limits (often 250 MB), duration limits (often 5 minutes on free), watermarks on the output, signup walls.

The watermark step is the operator’s leverage. A watermark on free output is the conversion mechanism — it pushes users to the paid tier where the watermark goes away. Most tools in this category make their money exactly here.

None of this is sinister. It is the predictable result of running a server-based video tool. The economics force the design.

What changes when you skip the server

If the video processing happens in the browser itself, every constraint above disappears in a chain:

No upload means no upload wait. The file is already on the user’s device. Processing starts the instant the user drops the file.
No server processing means no per-user compute cost to the operator. The operator’s marginal cost per user is the static bandwidth for the HTML and JavaScript — close to zero on a CDN.
No marginal cost means no need for a paywall, watermark, file-size cap, or signup wall. The tool can be genuinely free because giving it away does not cost the operator anything beyond the fixed hosting cost.

This was not technically feasible for video tools until recently. Browser JavaScript could not encode H.264. There was no fast in-browser speech-to-text. WebGPU did not exist. Without those, “process video locally” meant a heavy plugin or a downloaded app, not a web tool.

In 2026, the situation is different. The web platform now exposes hardware-accelerated video encoding, hardware-accelerated GPU compute, and runtime support for shipping ML models to the browser. The tools to do the job locally now exist as standard browser APIs.

The three building blocks BurnSub depends on

Three browser APIs make the local-only architecture possible. Each is documented, each is in production browsers, and each replaces a piece that previously required a server.

WebCodecs

WebCodecs is the W3C API that exposes hardware-accelerated video encoders and decoders to JavaScript. It is not a software emulator. When you call encoder.encode(frame), the work runs on the same silicon that decodes a YouTube playback (Intel Quick Sync, Apple VideoToolbox, AMD VCE, or the platform’s equivalent).

Browser support as of 2026:

Chrome 94+ (September 2021)
Edge 94+ (September 2021)
Safari 17+ (September 2023)
Firefox 147+ (added the encoder relatively late)

For older browsers, WebCodecs is not available — that is the cost of building on this API.

WebGPU + transformers.js + Whisper-Turbo

OpenAI’s Whisper speech-recognition model exists in several sizes. The smallest variant, Whisper-Turbo, is compact enough to ship to a browser, load via the transformers.js runtime, and run on the user’s GPU through WebGPU.

This is the part that unlocks auto-captioning without a server. Before Whisper-Turbo could run in the browser, every tool that offered “auto-generate captions from speech” had to send the audio to a server (OpenAI’s API, a self-hosted Whisper instance, or a competing service). Once the model became browser-runnable, that requirement evaporated.

The implication is structural. A tool that auto-captions and burns subtitles can now exist entirely client-side. There is no architectural reason for the upload step anymore.

Canvas + OffscreenCanvas for caption rendering

The caption-drawing step — turning a styled subtitle cue into pixels overlaid on a video frame — happens on a <canvas> element. This part is not new. The Canvas 2D API has existed since 2010. What changed is throughput: with OffscreenCanvas and WebCodecs feeding frames directly to GPU memory, drawing captions on top of video frames is fast enough to run in real time.

What BurnSub gives up to take this path

The local-only architecture is not free. There are real trade-offs.

1. Older browsers cannot use the tool. WebCodecs in particular cuts off pre-2023 Safari and pre-2024 Firefox. Some Android devices with older Chrome builds also fall back. The user’s browser is part of the system requirement. A server-based tool works on any browser that can upload a file — a much wider audience.

2. The user’s machine matters. Encoding video on a 6-year-old laptop is slower than encoding it on a 1-year-old desktop. A server tool can give every user the same processing speed because the server is the same. A local tool inherits the user’s hardware.

3. No filters, no advanced video work. WebCodecs does encoding and decoding. It does not do color correction, frame interpolation, or any of FFmpeg’s 100+ filter operations. A tool with broader scope (color grading, B-roll insertion, voice cloning) cannot be built on WebCodecs alone. BurnSub is intentionally narrow — burn captions and output an MP4 — because the narrow scope fits the API.

4. Audio is handled separately. WebCodecs treats audio and video as separate streams. Muxing them back into an MP4 with synchronized audio is the developer’s job, not the API’s. That is solvable but it is work.

These trade-offs are not bugs in the approach. They are the cost of refusing to upload. For a subtitle-burning workload, the cost is acceptable. For a more ambitious video-editor workload, it might not be.

What “local” actually means for the user

The promise is: the video never leaves your device. This is a claim that can be tested, not a claim that needs trust.

To verify it on BurnSub:

Open burnsub.com in Chrome or Edge.
Open DevTools → Network tab. Filter to “All” requests.
Drop a video file into the dropzone.
Watch the requests as the tool processes the video.

What you should see: the initial page load (HTML, CSS, JavaScript), then the Whisper-Turbo model file (about 200 MB, cached after first load), then nothing. No outgoing request that contains video bytes. No POST to an API endpoint with the video as a payload.

What you should not see: any large POST request to a domain controlled by BurnSub or anyone else.

This is the test that distinguishes a true local-only tool from a tool that markets itself as local but uploads anyway. The DevTools network tab cannot be faked. Either the bytes leave the device or they do not.

The styling problem

If the video pipeline were the hard part of building a subtitle tool, BurnSub would not have 30 named caption presets. The hard part of a subtitle tool is the styling. A subtitle burner is roughly 10% video pipeline and 90% caption design.

The reason is that captions live in a tightly constrained design space — they have to be readable at thumb-scroll speed on a 5-inch phone, fit a 9:16 safe area on TikTok and Reels, work against any background color, animate without distracting from the video, and match the visual vocabulary of the platform. None of this is automatic. Every preset is a design decision that has to be made deliberately.

BurnSub’s 30 presets are the response to that problem. Each is a JSON configuration: font, weight, color, stroke, background, animation, position, max line length. The configurations are visible in the open style library, and each one is tied to a specific intended use case (TikTok talking-head, MrBeast-style Shorts, Reels minimal aesthetic, anime fan-sub, podcast clip, gaming hype, and so on — see the caption style field guide for the full mapping).

The configurations serialize into URLs. Picking a style and customizing it produces a permalink that opens BurnSub with the exact look pre-loaded. Sharing a style is one click. The whole library is meant to be remixable rather than monolithic.

What is not yet good

Honesty about the gaps:

Right-to-left scripts and complex shaping. Arabic, Hebrew, Hindi, Thai, and similar scripts have specific rendering requirements (bidirectional text, glyph shaping, combining marks). BurnSub’s current renderer handles Latin scripts well and the others adequately at best. Improving this is in the roadmap.
Audio mixing. The current pipeline keeps the original audio track unchanged. A future version should support audio gain, ducking under captions, or replacing the audio entirely.
Long-form captions. The pipeline is tuned for short-form (TikTok, Reels, Shorts). Multi-line captions for 30-minute podcasts work but the UX is not optimized for them yet.
Browser extension. A right-click “burn captions into this video” extension would let the tool work on any video file from the browser context menu. Started, not finished.

These are real limitations. They are listed because pretending they do not exist makes the tool’s claims less credible, not more.

Why this matters beyond one subtitle tool

The argument for BurnSub is the argument for a broader category of tools that the modern browser now supports: privacy-preserving, locally-running utilities that previously required either a desktop app or a server. Image editors, audio editors, file converters, basic video tools, ML inference — all of these are increasingly feasible without a server tier.

The constraint that justified server-based architectures — “the browser cannot do this” — is no longer true for many tasks. When the constraint goes away, the design choices that grew up around it should be re-examined.

The user should not have to upload their file to caption it. The operator should not have to charge for caption work that the user’s own GPU can do. The fact that both sides accept the upload step today is mostly historical inertia.

BurnSub is one small example of refusing the inertia. There will be others.

If you want to test the tool itself, open BurnSub and drop in a video. The browser does the rest. If the architecture works for you, the WebCodecs deep-dive explains the technical details of how the pipeline is built.