🔥 BurnSub
Deep dive · · 10 min read

WebCodecs vs ffmpeg.wasm: how browser-local video encoding actually works

A technical breakdown of the two ways to process video in the browser — what each does, where each fails, and how a narrow-scope tool like BurnSub uses WebCodecs to avoid the upload step entirely.

AK
Atilla Kürük
Solo dev building browser tools

If you want to process a video in a browser without uploading it to a server, you have exactly two real options in 2026: ffmpeg.wasm or WebCodecs. They look superficially similar — both run in the browser, both handle MP4. They are fundamentally different technologies with different performance profiles, different scope, and different use cases.

This post is a technical breakdown of how each works, where each is the right tool, and why a narrow-scope tool like BurnSub fits the WebCodecs model.

What ffmpeg.wasm actually is

ffmpeg.wasm is the FFmpeg C codebase compiled to WebAssembly. When you call it from JavaScript, you are running the full FFmpeg binary inside a WebAssembly virtual machine. The video frames, the encoder, the decoder, the muxer — all of it lives inside the JS engine’s WebAssembly sandbox.

A typical usage looks like this:

import { FFmpeg } from '@ffmpeg/ffmpeg';
const ffmpeg = new FFmpeg();
await ffmpeg.load();
await ffmpeg.writeFile('input.mp4', videoBytes);
await ffmpeg.exec(['-i', 'input.mp4', '-vf', 'subtitles=in.srt', 'out.mp4']);
const output = await ffmpeg.readFile('out.mp4');

This API is great for one thing: it does everything FFmpeg does. Every filter, every codec, every container, every quirk of 25 years of FFmpeg development. If you can write an ffmpeg command line for your job, ffmpeg.wasm can run it in the browser.

The cost is the execution model. WebAssembly is not direct hardware access. Every operation — decoding a frame, applying a filter, encoding a frame — happens in software inside the WASM runtime. Modern CPUs have dedicated silicon for H.264 and H.265 encoding (Intel Quick Sync, AMD VCE, Apple VideoToolbox). ffmpeg.wasm cannot use any of it. WebAssembly is sandboxed and cannot reach the underlying hardware encoders.

The result is that ffmpeg.wasm is substantially slower than running the same FFmpeg binary natively. The ffmpeg.wasm project’s own README acknowledges the constraint, and benchmarks across the community consistently show software encoding running roughly an order of magnitude slower than hardware-accelerated paths on the same machine. The exact gap depends on the codec, the resolution, and the host CPU.

For short clips that gap is bearable. For a longer recording at 4K, it becomes the dominant cost of the operation.

What WebCodecs actually is

WebCodecs is a W3C API that exposes the browser’s built-in hardware-accelerated video encoders and decoders to JavaScript. It is not a virtual machine. It is a direct binding to the same encoder Chrome uses to play YouTube and the same decoder Safari uses to display a Twitter video.

The minimal API looks like this:

const encoder = new VideoEncoder({
  output: (chunk) => { /* receive encoded H.264 chunk */ },
  error: (e) => console.error(e)
});

encoder.configure({
  codec: 'avc1.42E01F', // H.264 Baseline 3.1
  width: 1920,
  height: 1080,
  bitrate: 5_000_000,
  framerate: 30,
  hardwareAcceleration: 'prefer-hardware'
});

// For each frame:
encoder.encode(videoFrame, { keyFrame: false });

hardwareAcceleration: 'prefer-hardware' is the line that matters. The browser checks your GPU, your OS, and your driver, then routes the work to whatever hardware path is fastest. On a laptop with Intel Quick Sync, the encoder runs on dedicated silicon. On an M-series Mac, it goes through VideoToolbox. On Android, it uses MediaCodec. The JavaScript API surface is identical across platforms; only the underlying acceleration changes.

The trade-off: you get only what the API exposes. No filters. No subtitle muxing. No format conversion magic. You get encoding, decoding, and the raw bytes between them. Everything else — drawing subtitles on a frame, muxing into MP4, syncing audio — you build yourself in JavaScript.

This is exactly the trade-off a subtitle burner needs. The job is narrow. The narrow API fits.

The performance gap, hedged honestly

I will not invent benchmark numbers. Here is what is widely accepted in the public record:

  • Native FFmpeg with hardware encoding (Quick Sync / VCE / VideoToolbox) typically encodes 1080p H.264 at hundreds of frames per second on a desktop GPU, per the FFmpeg documentation and broad community benchmarks.
  • WebCodecs uses the same underlying hardware paths. Throughput on the same machine is in the same order of magnitude as native FFmpeg, with overhead in the binding layer rather than the encoding itself.
  • ffmpeg.wasm has been benchmarked by the community to run roughly an order of magnitude slower than the same FFmpeg binary natively, because of the WebAssembly software-encoding constraint.

The headline claim — for the burn-subtitles workload, WebCodecs is fast enough that the encode step is no longer the bottleneck — follows from those three facts. On a modern machine, the slow step is no longer “encode the frames”; it is “draw the captions” and “wait for I/O on the input.”

Anyone running these tools today can measure the difference themselves. The community benchmarks are public, and individual users can validate them with a short script and a test clip. The hedged statement above is conservative; specific milliseconds depend on your machine and your codec settings.

The pipeline, step by step

This is how a WebCodecs-based subtitle burner like BurnSub processes a video, end to end, in the browser. Every step uses standard APIs documented at MDN.

Step 1 — Demux the input

The browser parses the MP4 container and extracts the encoded video and audio tracks. There are several JavaScript libraries for this job — MP4Box.js is the most established. The output is a stream of EncodedVideoChunk objects (the encoded H.264 NAL units) plus an audio track.

Step 2 — Decode video to frames

A VideoDecoder consumes EncodedVideoChunk objects and emits VideoFrame objects:

const decoder = new VideoDecoder({
  output: (frame) => { /* got a decoded frame */ },
  error: (e) => console.error(e)
});
decoder.configure({ codec: 'avc1.42E01F' });

Each VideoFrame is a raw bitmap that lives in GPU memory by default. You can draw it to a canvas or read it back to CPU memory. Drawing to canvas is dramatically faster because the data never leaves the GPU.

Step 3 — Render the caption overlay

For each frame, the tool:

  1. Draws the decoded VideoFrame to an OffscreenCanvas via drawImage(frame, 0, 0).
  2. Looks up which subtitle cue is active at this frame’s timestamp.
  3. Renders the cue using the chosen style (font, color, stroke, animation, position) via canvas text drawing.
  4. Captures the composited canvas as the input for the next step.

The canvas drawing step is where caption styling lives. In BurnSub’s case, every preset is a JSON configuration that maps to canvas drawing instructions. The rendering itself is plain ctx.fillText(), ctx.strokeText(), and similar calls — well-documented Canvas 2D API, no exotic technology.

Step 4 — Encode the composited frame

A VideoEncoder consumes the composited frame and emits a new EncodedVideoChunk:

const encoder = new VideoEncoder({ output: handleChunk, error: console.error });
encoder.configure({ codec: 'avc1.42E01F', width, height, bitrate, framerate });
encoder.encode(compositeFrame);

This is the step where WebCodecs earns its keep. The encode runs on hardware. For 1080p H.264 it is typically real-time or faster on a modern machine — a 60-second clip is in the same ballpark as 30–60 seconds of total encode time, depending on hardware and bitrate.

Step 5 — Mux into MP4

The encoded chunks need to be wrapped back into an MP4 container with the audio track. Open-source JavaScript muxers exist for this — the mp4-muxer project is one well-maintained option. The output is a Blob containing the final MP4 file.

Step 6 — Download

A simple URL.createObjectURL(blob) plus an anchor click. The user gets the MP4. The file never left the browser, which is verifiable in the DevTools network tab.

That is the entire pipeline. Six steps, all using documented browser APIs, all hardware-accelerated where possible, all running on the user’s machine.

Where WebCodecs falls short

Pretending WebCodecs is universally better than ffmpeg.wasm would be dishonest. The trade-offs are real.

1. Codec support is narrower. WebCodecs guarantees H.264 and VP9 in modern browsers. AV1 encoding support is patchy. HEVC encoding is essentially unavailable outside Safari. ffmpeg.wasm can do all of these because it carries its own codec implementations as part of the WASM bundle.

2. No filters. No -vf scale=, no -vf crop=, no built-in subtitle filter. If you want to scale a video, you do it yourself by drawing to a smaller canvas. If you want to apply a color filter, you write the shader yourself or use canvas operations.

3. Browser support is recent. Per caniuse.com:

  • Chrome 94+ (September 2021)
  • Edge 94+ (September 2021)
  • Safari 17+ (September 2023)
  • Firefox 147+ (added the encoder relatively late in the cycle)

Older browsers cannot encode. ffmpeg.wasm works on any browser with WebAssembly, which is essentially every browser shipped since 2017.

4. Audio is separate. WebCodecs has AudioEncoder and AudioDecoder but the API treats audio and video as parallel, symmetric tracks. You manage the audio separately and mux it manually. ffmpeg.wasm handles audio transparently within an ffmpeg command line.

5. Container handling is not in scope. Both demuxing (parsing the input MP4) and muxing (writing the output MP4) are outside the WebCodecs API. You bring your own parser and muxer. ffmpeg.wasm includes muxers for every container FFmpeg supports.

The honest summary: WebCodecs is a narrow, fast, hardware-accelerated codec API. ffmpeg.wasm is a wide, slow, software-only video toolkit. If your job fits the narrow API, WebCodecs wins on speed and architecture. If you need filters, exotic codecs, or older-browser support, ffmpeg.wasm wins on capability.

Why a subtitle burner fits the narrow path

A subtitle burner is the canonical case for WebCodecs. The job has exactly two pipeline stages that are codec-heavy: decode the input, encode the output. Everything between — drawing the caption — is plain canvas work, which is fast in any browser.

There is no need for color grading, no need for de-noising, no need for resampling beyond what canvas already does. The “filter graph” is a single text overlay.

The instinct for someone new to browser video processing is to reach for ffmpeg.wasm first, because FFmpeg is the universal video toolkit. For a narrow-scope tool, that instinct produces a slower tool than necessary. The correct question is not “how do I make ffmpeg.wasm fast enough?” but “do I actually need everything FFmpeg provides?”

For a subtitle burner, the answer is no. For a full video editor with color grading and B-roll, the answer is probably yes.

How to choose between the two

If your tool has any of these properties, use ffmpeg.wasm:

  • It needs to do color filtering, sharpening, scaling, frame-rate conversion, or any of FFmpeg’s filter operations.
  • It needs to support browsers older than Chrome 94 / Safari 17.
  • It needs to work with exotic codecs (HEVC, AV1 in older Chrome, ProRes, DNxHD).
  • The pipeline involves multiple stages that FFmpeg already chains together natively.

If your tool is closer to a narrow, single-purpose profile, use WebCodecs:

  • One codec-heavy operation per frame (encode or decode), with non-codec work in between.
  • Modern browsers as the audience.
  • Sub-second latency targets where the WASM overhead would hurt.
  • A clear willingness to write your own demuxer and muxer.

Both are real options. Both are production-ready in 2026. They are not enemies — they are different solutions to different sub-problems of the same broader problem of browser-local video processing.

What comes next

The thing that will make this trade-off less stark is WebGPU. WebGPU gives direct shader access for video processing on the GPU, with much better performance than canvas drawing. It already runs the Whisper-Turbo speech-recognition model that tools like BurnSub use for auto-captioning, via transformers.js.

A future version of the BurnSub pipeline will likely move the caption rendering step into a WebGPU shader. The encode and decode stay on WebCodecs. The shader handles the per-frame compositing work. That is the path to truly real-time processing, including for 4K content.

That is a future post.

Quick reference

For anyone building or evaluating browser-local video processing:

The architecture choice between WebCodecs and ffmpeg.wasm is one of the more consequential decisions in browser-local video work. The right answer depends on what your tool needs to do, not on which API is more popular at the moment. For narrow tools, WebCodecs is the better fit. For everything else, ffmpeg.wasm remains the default.