Multi-Window, HDR Toggle, Encode Queue

The output modes post from a year ago gave Lux four ways to display its output: preview, split, background, separate window. That covered the “one projector or one monitor” case. It did not cover the real installation case, which is four projectors and a video wall and a stream encoder, all running simultaneously.

This post is three new systems for the actual installation workflow. Each one solves a specific failure mode I kept hitting when trying to use Lux for a real show.

Multi-window fleet

MultiWindowFleet in lux-live/src/multi_window.rs manages up to 16 output windows plus the implicit editor window. Each output window is a WindowShell — a lightweight container owning a winit window handle, a wgpu surface, a render target, and some per-window config (resolution, refresh rate, HDR mode, colour space).

The fleet is the thing the editor talks to when the user says “add another output window.” It enforces two hard caps at add time:

  • 16-shell cap: the 17th add_output() call returns VramExceeded::ShellCapReached. Not because 16 is a mathematical limit — wgpu is fine with more — but because 17 windows in a creative coding tool is almost certainly a mistake, and the app should catch it rather than let it happen.

  • 50% device-VRAM cap: the sum of per-shell swapchain + post-target VRAM estimates can’t exceed 50% of device VRAM. Four 4K Rgba16F windows with post-FX are about 1.2 GB of GPU memory; a 2 GB GPU can’t host more than one; a 16 GB GPU can host ten. The cap is a soft circuit breaker that prevents the user from accidentally pushing the GPU into heavy paging.

Both caps publish a FleetChip::*Rejected status-bar chip with actionable detail — requested / projected / device byte totals for VRAM, current / max shell count for the shell cap. The user sees why the add failed, not just that it failed.

The VRAM arithmetic is worth getting right. For a 4K Rgba16F window with the full post-FX stack, the estimate includes:

  • 1 swapchain texture: 3840 × 2160 × 8 bytes = 66 MB
  • 1 HDR intermediate: 3840 × 2160 × 8 bytes = 66 MB
  • 1 TAA history buffer: 66 MB
  • 1 Bloom mip chain (log2(1920) - 2 = 8 levels): ~88 MB
  • 1 SSAO work buffer (if enabled): ~33 MB

That’s ~320 MB per 4K window with everything on. The documented “~1.2 GB for 4K Rgba16F” figure is four of these; the math in OutputWindowConfig::vram_estimate_bytes is byte-for-byte traceable to that number. Which matters because the user-visible “you can’t add this window” error cites specific byte counts, and those counts have to be defensible.

Monitor hot-plug

Monitor hot-plug is one of those things that looks trivial in the demo and bites you in production. A user plugs in a projector while the app is running. Or unplugs one. Or the OS decides to reconfigure the display layout. All three events need to be handled without crashing.

The fleet handles them through on_monitor_plug / on_monitor_unplug methods that walk the shell list, mark affected shells dirty, and hand the affected handles to the HDR live-toggle path (next section) so the reconfigure rides the same blanking-interval machinery as user-initiated HDR toggles.

This is mostly plumbing. Winit fires a MonitorEvent. The fleet catches it. Each shell that was targeting the affected monitor gets a new surface on the new monitor. Windows that can’t be reassigned (the monitor is gone gone) get closed.

The non-trivial case is when a monitor disconnects mid-frame. The frame being submitted references a surface that no longer exists. wgpu’s contract says you can’t present to a dead surface; doing so crashes. The fleet detects the disconnect during the submit loop, drops the in-flight frame for that shell, and resumes on the next frame. One lost frame per hot-plug event; not a crash.

HDR live toggle

One of the things the Lux pipeline can now do that most creative coding tools can’t is drive an HDR output surface. Windows and Linux support for HDR output has gotten genuinely usable in the last year (Windows 11 does Dolby Vision and HDR10 through winit; Linux via Wayland’s color management protocol and the appropriate KMS extensions). Creative work that exceeds the sRGB gamut — a sunset with real solar intensity, a neon-drenched scene with saturated reds outside Rec.709 — looks different on an HDR display than on an SDR one.

HdrLiveToggle is the subsystem that lets a user flip a running output window from SDR to HDR without restarting the app or losing the scene state. Two things have to happen:

  1. Reconfigure the surface. wgpu lets you change the colour space of a surface via Surface::configure with a new SurfaceConfiguration. The reconfigure is a driver-level thing that has to happen on a quiet frame (no in-flight commands on that surface). The fleet knows when the surface is quiet and schedules the reconfigure during the next blanking interval.

  2. Update the tonemap operator. The HDR-first pipeline already runs every scene through a dedicated tonemap pass. For SDR output, the tonemap squeezes the HDR buffer down to 8-bit sRGB. For HDR output, the tonemap uses a different operator (PQ or HLG instead of ACES-fitted sRGB) that preserves HDR-10-to-1000-nit output. Switching operators is a single uniform update; the scene’s internal state is unchanged.

The toggle is live. You can be running a scene, press the HDR-enable shortcut, and the output transitions within two frames. No pause. No black. The scene keeps rendering.

This matters for live performance. A VJ running a show on an SDR monitor but projecting to an HDR-capable display can enable HDR for the projector and keep the monitor SDR. Two outputs, different colour pipelines, switched live. Before this system, you’d have had to pre-configure the output mode at app launch, and changing it meant a restart that killed whatever state your patch was in.

I’m fairly sure the HDR colour science is right — I’ve eyeballed the output on one HDR projector I had access to, and the tone mapping looked plausible — but I haven’t done a full instrumented calibration against a reference display. If a real colourist tells me this is wrong, I’ll update it.

Encode queue

The third system is the recording-and-streaming pipeline. Live visual work frequently wants to be recorded — for documentation, for remote audience streaming, for social post-hoc. Doing this the naïve way is a disaster: the render thread writes a video frame to an H.264 encoder inline, which blocks for tens of milliseconds per frame, which crashes the 16.67ms frame budget on any encoder worth using.

EncoderQueue is a dedicated-thread subsystem. Each output sink (RecordVideo, RtmpOut, HlsOut, SrtOut, WebRtcOut — the five flavours currently wired up) owns an EncoderSession with:

  • A bounded crossbeam channel (128 frames deep; about 2 seconds of 60fps backpressure tolerance).
  • An OS thread that drains the channel and calls into an EncoderBackendOps vtable (implemented by the render layer with the actual video encoding — ffmpeg-next under the hood).
  • A ring of 3 Rgba16Float frame-copy textures pre-allocated so the render thread can submit a frame copy without blocking on allocation.

The render thread’s contract: one atomic push into the channel per frame. That’s it. The push is ~100 nanoseconds. The encoder thread handles everything downstream — colour conversion, encoding, network egress.

If the channel fills (encoder can’t keep up), new frames get dropped, not queued. The alternative is unbounded queueing, which appears to work until the process runs out of memory and dies. Drop-on-backpressure is the right answer for live streams, where a dropped frame is invisible to the viewer and a stalled render destroys the show.

The 3-frame ring is deliberately small. Each Rgba16F 1080p frame is ~16 MB; three of them is ~48 MB per sink. Across five sinks running simultaneously, that’s 240 MB of encoder buffers — real memory but tractable.

The EncoderBackendOps vtable is how the render layer injects concrete codecs into the otherwise-codec-free lux-live crate. lux-live doesn’t depend on ffmpeg; the render layer provides an implementation that does. Same pattern as the framegraph’s TransientAllocator — the subsystem is abstracted from GPU specifics so it can be tested in isolation, and the app plugs in the real implementation at startup.

Each sink has a hardware-session probe that runs once at sink creation, checking whether the current GPU has hardware-accelerated encoding for the requested codec. NVENC on NVIDIA, Quick Sync on Intel, VideoToolbox on macOS, AMF on AMD. The probe queries the codec capability, attempts a 1-frame encode, and records the result. If hardware encoding works, the session uses it; if not, it falls back to libx264 software encoding. Either way, the encoder thread keeps the render thread unblocked.

What these enable together

The three systems compose. A live-performance patch can now:

  1. Drive up to 16 output windows, each at a different resolution and colour space (via MultiWindowFleet).
  2. Toggle any subset of those windows into HDR mode without restarting (via HdrLiveToggle).
  3. Simultaneously record or stream from any output (via EncoderQueue).

All three systems are independent of the core render pipeline. Nothing in the fleet or encoder changes the frame budget on the render thread. Nothing in the HDR toggle interferes with in-flight renders. Failure in any of the three (VRAM exceeded, monitor disconnected, encoder crashed) produces a status-bar chip and a logged error, not a process death.

I haven’t run this in front of a real 16-projector rig yet. The 16-cap was arbitrary; the VRAM math is theoretical; the encoder thread has been tested on streams up to 4K60 but not for 24-hour-installation reliability. Those are the bits I’d like to verify with a real show before claiming it’s production-ready.

But the architecture holds together, the tests pass, and for the workflows I’ve tried — a laptop with two projectors and a recorded stream — it does exactly what I want it to do.

The next post is the other half of the live-performance work. When things do go wrong mid-performance — a panicking node, an oversized frame, a runaway heap — how do you keep the show running? That’s the PerfGuard post.


I have no idea what I’m doing or if any of this is right, but it’s fun. Follow along.

← Back to devlog