Deleting the Legacy Renderer

When you light a scene in Lux you are trusting two things you never stop to think about: that what you see is physically right, and that it stays right and stays fast once you fill the scene with skinned characters, ten thousand instances, and a forest of shadows. For most of the last year both of those depended on which of two renderers happened to draw your frame. This post gets it down to one, and it is the fast one.

Lux Goes 3D, a good many posts ago, introduced render_3d: “the new render pass.” It was small then. Clear the targets, loop the draw items, one indexed draw call each. Then it grew. Materials, lights, shadows, normal maps, IBL, instancing. Every 3D post added a little. By the time the bindless arm went live next to it, render_3d::execute was a 1,466-line function with a per-mesh bind-group loop beating away at its core, and it was the slow one. The fast bindless path could only draw static meshes. Anything interesting in your scene fell back to the year-old function.

This post deletes that function. But you cannot delete the renderer everything else is measured against until the fast path can stand on its own, and “stand on its own” means doing everything the slow one did: cascaded shadows, skinned characters, instanced grids. So most of this post is the parity sweep that gets the fast lane to feature-complete, and the deletion is the short, satisfying paragraph at the end that the sweep earns.

The parity sweep, or: getting everything into the fast lane

Two renderers is a fork in the road on every frame, and the old branch was the slow, per-mesh one. The sweep’s job was to make the fork unnecessary by teaching the bindless arm every trick the legacy path knew, one shader permutation at a time.

Shadows. A BindlessShadowPipeline with a DEPTH_ONLY shader permutation and multi-cascade indirect output slots, so each CSM cascade gets its own meshlet cull and depth-only raster. The shadow orchestrator drives the cascade passes; the bindless arm rasterises them. Your shadows now ride the fast path instead of dragging the frame onto the slow one.

Skinned characters. A BindlessSkinnedMeshPipeline with an IS_SKINNED permutation, fed by a UnifiedDeformedBuffer: one buffer that every skinned draw scatters its compute-skinned vertices into, so a stage full of dancing characters is still a single indirect multi-draw rather than one draw call each. Because skinned things cast shadows, there is a DEPTH_ONLY times IS_SKINNED permutation for skinned shadow casters too. This is where the skinning closure and the bindless path finally meet: a DrawItem::Skinned flows all the way to a bindless indirect draw.

Instancing. GPU instancing in the bindless arm, so the ten-thousand-cube trick survives the move and you can still throw a stadium of geometry at the screen for the cost of one draw.

The leftovers. Frame zero has no Hi-Z pyramid yet (there is no previous frame to build one from), so the first frame dispatches through a depth stub instead of occlusion-culling against garbage. The legacy Blinn-Phong material got mapped onto the unified PBR module instead of carrying its own private shading maths, which is one fewer way for a material to look different depending on how it was drawn. And a CPU/GPU std430 layout drift in the material struct got fixed, a sentence that hides about an hour of staring at hex.

There was one genuine blocker. The bindless arm’s vertex pulling, how the vertex shader fetches position, normal, UV, and tangent out of the pooled buffers, was wrong in a way the old placeholder shader had been generously hiding. It got a full redesign and an end-to-end functional gate, because “the indirect draw executes” and “the indirect draw executes and the vertices are where they are supposed to be” are very different claims, and only the second one puts the right picture on your screen.

By the end of the sweep the bindless arm does everything the 1,466-line function did. The shader that was 778 lines back in the meshlet post is past a thousand now. Shadows and skinning cost real WGSL. They also now cost it once, on the fast path, for everyone.

The deletion

render_3d::execute is deleted. The per-mesh make_bg3 dedup loop goes with it. render_3d.rs is 184 lines now, holding execute_scene_buffer, the non-bindless indirect path that low-tier hardware falls through to, and nothing else. CI grep gates keep the dead names (make_bg3, MaterialTexHandles, the legacy execute) pinned at zero hits, so the slow path cannot quietly grow back.

There is one mesh renderer in Lux now. On tier-3 hardware it is the bindless arm. On everything below tier 3 it is execute_scene_buffer: still indirect, still one draw store, just without the bindless texture array. For you that means every rendering improvement from here lands in the one path your machine actually runs, instead of being split across two that drift apart. A function I had been adding to for the better part of a year, gone in a single commit. It felt great. It always does.

How do you know it is right, once the thing you measured against is gone?

This is the real subject of the post, and it is the part that reaches your screen most directly.

Bit-Identical Goldens built a verification regime around one idea: render every test patch on the new code, compare it pixel-for-pixel against a reference baked by the old code, and fail on any drift past a single least-significant bit. The bindless plan leaned on the same idea, gating the bindless arm at perfect similarity against the legacy renderer.

Delete the legacy renderer and that gate evaporates. You cannot compare against a function that no longer exists. The reference renderer was never a permanent fixture; it was scaffolding, and this post takes it down.

Which is a good thing, because “matches the renderer from three months ago” was always a circular question. It certified that nothing changed, not that anything was right. If the old renderer had a quiet energy leak, a little too much light vanishing at grazing angles, every golden baked from it certified that leak, faithfully, forever, and your metals would have been a shade too dark in every scene while the test suite swore everything was correct. A test like that cannot tell you the picture is right. It can only tell you it is the same.

So verification stopped asking “does this match the old renderer?” and started asking “does this match the physics?” That question has renderer-independent answers, and they are the ones that actually defend what you see:

A white-furnace test: a chrome sphere lit by a uniform white environment, shaded by an energy-conserving BRDF, must reflect uniform white back at every roughness and every view angle. No reference image required. The correct answer is “white,” and “white” comes from the maths, not from last quarter’s build.
An analytic PBR suite: for a known light, a known material, and a known view, compute the radiance Cook-Torrance should produce and assert the rendered pixel matches it. This is the test that keeps your gold the right gold.
A sphere-grid property suite: render a grid of spheres sweeping metallic and roughness and assert the properties that must hold (rougher is less glossy, more metallic has less diffuse) without ever naming an expected pixel.

The snapshot tests that remain got demoted on purpose. They are no longer “the truth.” They are a gross-corruption tripwire: a baked PNG that catches a structural collapse of the render, with thresholds loose enough that a real visual improvement does not trip them. The PNG is never auto-updated. Re-baking one is a deliberate, human-reviewed act, because a regression that can re-bake its own reference is a regression that has learned to lie. Correctness lives in the physics assertions; the snapshot just catches the catastrophes.

This is a better deal for you than the old one. “Matches the analytic radiance” cannot be satisfied by one wrong renderer agreeing with another wrong renderer, which means a genuine improvement to how light behaves can finally land instead of being blocked by a test whose only opinion was “but the old version looked different.” It was deleting the legacy path that forced the upgrade, which is a strange road to a better test suite, but I will take it.

A render-determinism flake in a 200-light shadowed scene got chased down and fixed in the same window, the kind of bug that only surfaces once the snapshot is a tripwire and the tripwire starts flickering at you. One fewer way for a heavy scene to render differently twice in a row.

Pass 6, closed

The meshlet post laid out six passes and called itself “the sign on the construction site explaining what’s behind the fence.” Pass 1 was the cull gates. Passes 4 and 5 were the skinning closure. Passes 2 and 3 were the bindless arm going live. Pass 6 was the closeout, and this is it. The placeholder shader is gone, the legacy renderer is gone, the bindless arm is the renderer, and it is checked against physics instead of against its own past.

The sign can come down. The fence comes down with it. What you are left holding is one renderer, the fast one, drawing shadows and skinned characters and ten thousand instances down a single lane, and proving itself correct against reality rather than against the way it happened to look last quarter.

The cleanup a rewrite this size always leaves behind is the next post.

Still building this in the open. Follow along.