Carving the Core

A demo and an instrument are different things. A demo has to work once, on my machine, while I hold it still. An instrument has to hold up in your hands, in a live set, at 120fps, while you push it in ways I never tried. The distance between those two is mostly cleanup, and this post is a chunk of that distance.

Deleting the legacy renderer closed the meshlet contract: there is one mesh renderer now. But a rewrite that size, a year of render_3d::execute deleted and a bindless arm grown in its place with shadows and skinning and instancing all re-landed, leaves a wake. Files two workstreams both piled work into. Caches that got built and then half-adopted. A framegraph that quietly grew a second way to run. None of it shows up in a screenshot. All of it shows up the first time you lean on the tool.

One renderer, one framegraph, actual shadows

With the legacy renderer gone, Render3D is a single thing. Its framegraph should be too, and it wasn’t.

Shadow cascade passes, the cluster-light bin pass, and the main mesh raster were being assembled and run as separate concerns. And the shadow side had a real problem: on the unified path, it was not producing pixels at all. The shadow orchestrator post described ShadowOrchestrator as the thing that owns shadow dispatch, and it did own the bookkeeping, the cascade-snap history and per-frame matrices, all correct. But the passes it handed the framegraph were stubs: built, compiled, dropped, zero pixels out. The cascade raster that post described was, at the framegraph level, an empty pass with a confident name. Put a shadow-casting light in a scene on that path and the shadow simply would not be there.

This commit folds the real work into one graph. Shadow cull and depth-only raster, cluster-light binning, and the mesh raster are all registered into a single Render3D framegraph, compiled once, executed once through run_with_dispatch, the async-compute path. One compile(), one executor call. Two things you get from that: shadows that are actually there, and every QueueAffinity::Compute tag in the graph now doing real overlap on the live path instead of only in a test, which is the headroom that keeps your frame budget intact when the scene gets heavy.

This is the kind of fix that has no screenshot and changes the picture completely. Before it, your shadows were a promise the framegraph was quietly not keeping.

The caches finally get hit

Built, Not Adopted got scene_bloom to a 100% steady-state bind-group cache hit rate and noted the rest of the post chain was “close behind.” Close behind is not behind. There were still thirty-plus places on the hot path calling device.create_bind_group every single frame, rebuilding a bind group that had not changed since last frame, or the frame before, or ever.

Here is why that one reaches you. A bind group rebuilt every frame is an allocation every frame, and an allocation every frame is the exact shape of a hitch: fine in a benchmark, fine for the first minute, and then a stutter right when the set gets busy and you least want one. All thirty-plus sites route through BindGroupCache::get_or_create now, and per-frame create_bind_group on the hot path is a review-block. The zero-GPU-allocations post claimed this was handled. It was handled in the places I had looked. It is handled in the rest of them now, which is the whole difference between a steady framerate in a demo and a steady framerate in your hands.

Carving the god-files

The other half is structural, and it comes with a number: forty-six.

That is how many files in the workspace had grown past 800 lines, the budget the texture engine carve set when it broke a 2,688-line god-object into fourteen focused submodules. The budget was real for one directory. It was never a workspace rule, and in the time nobody was enforcing it, forty-six files drifted over the line.

You never open these files, so why is it your problem? Because the size of a file is the speed of a fix. A bug in a 200-line module gets found and fixed in an afternoon and shipped to you that week. The same bug in a 3,000-line god-object hides for a month and ships as a regression. The carve is the difference between the feature you asked for arriving soon and arriving eventually.

It also finds bugs on its own. The first file carved was framegraph_bridge.rs: 3,466 lines, the adapter that turns an abstract framegraph resource into a real wgpu allocation, split into eight modules. The split immediately turned up a latent bug, the way splitting always does. Two framegraph passes had been handed the same ordinal, an ID collision nothing had caught because nothing could see it in the wall of code. Filed and fixed. That is a glitch you will now never hit, found only because the file got small enough to read.

The sweep then moved into lux-core, and this is the part still in flight as I write. The graph engine rewrite (SlotMap, PinId, Arc-backed spreads) had piled work into graph.rs, eval.rs, pin.rs, context.rs, spread.rs; the bindless and scene work into scene.rs, draw.rs, mesh_builders.rs, texture.rs. Twelve are carved so far, each broken into focused modules behind an unchanged public surface, so nothing a plugin author leans on moves an inch. The rest are queued, and the sweep ends the only way a cleanup should: with a CI gate that fails the build if any file crosses 800 lines, so the budget stops being a thing I have to remember and becomes a thing the build remembers for me. I am reliably better at writing a script than at remembering a rule.

Why carve after, not during

A quick note on craft, because it is the reason the boundaries are worth anything. You do not carve a file while you are still rewriting what is in it. Breaking Up LuxApp made the forward version of this point: that decomposition was the prerequisite for the graph engine rewrite, done first, so the rewrite had somewhere to land. This post is the mirror. The files a rewrite grows get carved after, once the design has stopped moving, so the seams you cut are the real ones and not a guess at where the code is about to settle.

graph.rs is a different shape after the SlotMap arena than before it. scene.rs is a different shape after DrawItem became an enum. Carving them mid-rewrite would have drawn module lines through code that was about to move, and you would feel that later as the dumb friction of a fix that has to touch six files because the boundaries landed in the wrong place. Carving now draws the lines where the seams actually are, which keeps the next change cheap, which keeps the next feature fast.

What this leaves you with

The meshlet and bindless arc is closed. The legacy renderer is gone. The framegraph runs as one graph and the shadows in it are real. The hot path stopped leaking allocations. lux-core is most of the way through a carve that ends with a gate to keep it carved.

There is still a construction site here. The audit lists work these posts have not reached: a shade-path stub that does not yet import the unified PBR module, a GPU-particle path that is the one compute workstream still un-wired, a real-hardware bench so the async-compute payoff stops being a claim and becomes a number. The site moved. It did not close.

But the renderer is one renderer, the shadows land where you put them, the framerate holds when you push it, and the engine underneath is getting tidy enough that the next thing you ask for arrives faster than the last. That is what a cleanup buys, and it is a good place to stop a post.


Still building this in the open. Follow along.

← Back to devlog