Built, Not Adopted

Here’s a thing that happens when you ship a lot of code in a short time. You build a library. You write a test suite. You document it. You commit it. You write a blog post about it. Then you go build the next library. And six weeks later you sit down with the F8 profiler running on a real patch, and you realise the library is sitting on a shelf and the rest of the app has been politely walking around it.

BindGroupCache was getting used by 1 of the 34 sites that wanted it. BarrierPlan had a beautiful test suite and never got called from production. GpuProfiler worked, in the sense that it produced correct numbers when you ran the offline lux-profile binary. The live editor never ran the offline binary. MasterClock was constructed zero times. Beat-sync was a feature the way an unopened envelope is mail.

The pattern was everywhere. The welcome modal had eleven sample-patch cards. The first one worked. The other ten silently failed and burned the has_opened_before flag, which means the first-time user got exactly one shot at a working tutorial and then the welcome system politely declared the matter closed. The first-wire celebration animation re-fired every launch because nobody was checking the persistence flag. The global panic hook was set by precisely nothing.

The worst one was right in the middle of the 3D rendering pipeline, which I’ve written about a lot. The GPU compute skinning shader was a complete, tested pipeline that nobody on the live render path talked to. Every skinned mesh in production was running through a CPU fallback loop. You could open Lux, load a character, watch the bones deform a mesh smoothly at 60 fps, and not realise the codebase contained a state-of-the-art skinning compute shader and a CPU for i in 0..vertex_count driving every frame.

This post is about closing all of that.

How this happened

I’d been building libraries one after another for months. Each one passed its tests. Each one got documented and committed. And every time, instead of going back to thread the new library through the hot path, I’d start the next library. Wiring is integration work. It means reading every call site, plumbing the new thing through, updating tests, occasionally fixing stuff that broke. Building a fresh library in isolation is more fun. So the libraries kept stacking up and the hot path kept doing it the old way, because the wiring was always next week’s problem and next week kept not arriving.

The fix was twenty-two specific items across nine subsystems. Same shape every time: the library is correct, the test exists, the call site is one or two lines away. None of this was research. All of it was “find the thing, plug it in, run the tests, commit.”

I’m not going to walk through all twenty-two. Here are the dozen that mattered most, grouped by what they wire up.

The GPU side

Five post-process passes (scene_bloom, scene_taa, scene_cas, scene_grain, scene_tonemap) were building their bind groups and uniform buffers from scratch every frame. Ten device.create_bind_group calls and four device.create_buffer calls in the hot path that should have been hitting BindGroupCache::get_or_create and UniformBufferPool::get_owned from the day those caches landed. The migration was mechanical. The cache was already designed to drop in. After the swap, the steady-state hit rate on every post-FX pass went from 0% (it wasn’t being called) to 100% on the bloom chain, with the rest of the post chain close behind. That’s the gate I want to enforce on every future hot-path commit, and now there’s a number to enforce against.

BarrierPlan was a slightly bigger lift, because it actually had to be wired into the framegraph executor. The plan code statically computes a barrier sequence from the pass dependency graph; the executor’s job is to invoke it at the right point in the encoder lifecycle. The first version replaces wgpu’s auto-tracking on the post chain only, leaving the rest of the graph on auto-tracking until I’m sure the hand-rolled plan agrees with the auto-tracker on every path. There’s a parity test that asserts cached bind groups match fresh builds, and another that checks the post-chain produces byte-identical output with and without the new path. Both green.

GpuProfiler going live is the one I should have done six months ago. The profiler had been measuring CPU time per subsystem since the F8 post. The GPU side was a placeholder slot in the HUD that always read --. The fix: build the profiler on RenderState when the adapter exposes TIMESTAMP_QUERY, bracket every scene pass via open_pass("name", encoder) / close(encoder), call resolve_previous_frame and arm_readback at submit, feed the resolved spans into FrameProfiler::record_passes. The HUD’s gpu row shows real numbers now. They’re different from the CPU numbers in informative ways.

The texop_ms bucket (the one that used to lump every texture-engine operation into a single pile) got split into five: upload, framegraph, scene, post, upload_mesh. The lump version told you “GPU work was 3.2 ms.” The split version tells you which subsystem ate the budget. Useful when you’re trying to figure out why a frame spiked.

The clock that never ticked

lux-live::MasterClock is the thing that synchronises every beat-driven node in a patch: Metro, LFO, Pulse, Timeline, Sequencer, all of them. It supports LTC, MIDI Timecode, Ableton Link, tap tempo, and a system-clock fallback. It has an IIR phase filter for jitter rejection. It has a priority cascade that handles source loss gracefully. It had also, until this round, never been instantiated.

The wiring is four lines. Build it on LuxApp::new. Call step(now) every frame before the FrameContext gets built. Assign ctx.beat = clock.current_beat(). Beat-sync now actually beats. Rising-edge detection on a pulse node fires on the rising edge of the beat. BPM-unit LFOs run at the BPM the clock reports.

Cue lists, presets, and transitions stack on top. ProjectDocument got a LiveState field that holds the cue list, preset bank, current transition, and the clock. The cue panel is mounted in the editor chrome. Space goes to the next cue, J/K go previous and reload-current. The .lux file format gained presets, cues, and tempo_bpm fields with #[serde(default)] so existing projects load unchanged and quietly pick up the new defaults.

This is the work that turns Lux from “a creative coding environment” into “a creative coding environment you can run a live show from.” None of it is novel. All of it was structurally there months ago. It just wasn’t plugged in.

There were eleven sample patches in the welcome modal. The first one, hello_mouse_circle, worked because it was the demo I built the welcome modal for. The other ten were defined, registered, and dispatched through a code path that quietly returned None for every id except the one I had tested.

The fix is one line: swap bundled_template_bytes(id) for lux_ui::samples::find(id).map(|s| s.bytes). Plus eleven integration tests, one per sample id, that load the patch, evaluate one frame, and assert the graph isn’t empty. Cards 2 through 11 now do what they were always supposed to do. Card 1 still works, which is at least consistent with prior behaviour.

The recent_files list was being loaded into UserPrefs correctly and was never displayed. Same shape: WelcomeState::new had a with_recent_projects builder method that nothing called. Five recent files now show up at the bottom of the welcome modal.

The first-wire celebration animation was a different bug. The animation worked. It also re-fired every time the app started, because the gate condition was reading a session-only flag instead of the persistent prefs.has_completed_first_wire. New users get a celebration the first time they connect a wire. Returning users get peace.

Skinning, on the GPU, finally

The GPU skinning compute shader was correct end to end. The frame-ring ping-pong worked. The previous-frame SSBO retention worked. SkinnedMeshNode::process was already dispatching it. And then the live consumer (RenderSceneNode::process) was reading the CPU-fallback out: Mesh MeshHandle, because the schema flip from “MeshHandle” to “deformed-position SSBO” hadn’t landed yet.

The CPU fallback existed under a skin_cpu_fallback Cargo feature, default-on. Production was running it. The compute shader was running too. It just didn’t have a reader.

This round doesn’t finish the migration. It lands the parity test that proves the GPU output matches the CPU output to within 1e-4 per vertex on a 128-vertex synthetic mesh. It lands the SKINNED branch in the vertex-buffer-write shader that can read a deformed-position SSBO when wired. It lands the SkinningPipeline::dispatch call that replaces the skin_cpu invocation in SkinnedMeshNode. The schema flip itself (making DrawItem an enum with a Skinned variant) and the deletion of the CPU fallback live in a later post, because deleting skin_cpu_fallback has to land in the same commit as the consumer flip, and that consumer is part of a wider rewrite I’d rather land coherently.

For this round, the parity test is the gate. The GPU path is reachable. The deletion comes later.

The crash survivor

Panicking nodes have been logged-and-swallowed since zero-allocation eval. A panicking host would just go away, taking your unsaved patch with it. The fix is the standard one: chain panic::set_hook to the previous hook, snapshot the current ProjectDocument to ~/.cache/lux/crash-<iso>.lux, return through the original hook so the user still sees the stack trace and the process still exits cleanly.

On the next launch, EditorChrome::new scans the crash directory, and if anything’s there, shows a recovery banner offering to open the most recent snapshot. The banner has a “delete all” button that I added because development sessions on the panic-recovery system specifically produce a lot of crash files, and I got tired of cleaning them up by hand. (This turned out to be useful for normal users too.)

There’s no autosave on top of the crash snapshot. That’s a different feature and it’s been there since the trust pass. Crash recovery is the second line: even if autosave hasn’t run yet, the panic snapshot catches the state at the moment of the crash. Between the two of them I’ve stopped Ctrl-S’ing every twenty seconds out of habit.

A few small ones

Browser search. The browser.rs panel got a single-line text edit at the top that delegates to the existing search::score_entry and trigram_index infrastructure. The infrastructure was there. It didn’t have a UI surface. It does now.

Zoom-to-fit. H computes the AABB over the canvas’s node positions and sets pan and zoom. The zoom_to_fit calculation was already implemented for the SplitView mode; binding it to a key was four lines.

Panel shadows. The SHADOW_MED epaint::Shadow token has been in theme.rs since the editor polish round and had zero consumers. It now sits behind the inspector, palette, browser, and welcome frames, which is the entire population of floating panels. The visual difference is subtle. You don’t notice it. You’d notice if it was missing.

Inter shipped. The font binary lives at app/assets/fonts/Inter-Variable.ttf and registers at FontFamily::Proportional position 0. The len() * 6.0 heuristic that was guessing text widths in node_widget.rs is now text_metrics::mono_width, which uses the actual font metrics. Node titles no longer overflow their cards on long names.

The registry. The registry-building code in lux-app/src/lib.rs was 200 lines of hand-rolled registry.register::<NodeT>() calls, one per node type, in alphabetical order. Adding a new plugin meant editing this file. Retired via inventory. Every #[lux_node] now emits a factory spec at compile time, and build_registry is a four-line for spec in inventory::iter::<NodeSpec>() { registry.register(spec); }. Plugins migrate one at a time; the hand-rolled fallback sticks around until the last one moves.

Real IES vendor file. A photometric file from Erco shipped under lux-scene-light/tests/fixtures/. The IES parser had been tested against synthetic inputs only. It had never seen a real vendor file. It does now, and it parses correctly, which is to say I got lucky.

Spread proptest. A proptest suite for spread operations went in. Take, Cons, Zip, Cross, Distinct over arbitrary SpreadValue::F64 / Vec3 / Color inputs, asserting structural invariants. Nothing failed, which is either evidence of correctness or evidence that my generators aren’t mean enough. We’ll see.

What this unblocks

Most of these aren’t the kind of thing you’d notice individually. The ones that are visible (welcome modal actually working, F8 HUD showing real GPU numbers, beat-sync being a feature you can use) were always supposed to be visible. The library was correct; the wire was missing. There are no new capabilities here. There’s a lot of “the capability you thought we had, we now actually have.”

The reason this had to land before the next several posts is that every one of them depends on infrastructure from this round. The next post is about generalising the framegraph’s resource set from one variant to seven, and that depends on the framegraph executor actually being instrumented by GpuProfiler so I can prove the new variants don’t regress. The post after that is about unifying every shade path onto a single PBR module, which depends on the live render path having BindGroupCache adoption so the unification doesn’t add per-frame allocations. The shadow post depends on LightStore::set_shadow_desc actually being routed before Render3D dispatches.

Everything from here forward is wiring against a hot path that finally lives up to the library it has access to.

I have no idea what I’m doing or if any of this is right, but it’s fun. Follow along.