The Meshlet Path, So Far
This is a status post. There’s a sizeable in-progress rewrite of the live mesh path. I’d rather post a “here’s the contract, here’s what landed, here’s what hasn’t” update than wait for the whole thing to close.
The plan is six passes. Pass 1 landed. Passes 2 through 6 are queued. Each pass has an acceptance criterion, and the order matters: Pass 5 can’t land before Pass 4, Pass 4 can’t land before Pass 3, and so on.
Below is the contract. Subsequent posts will close it as the passes land.
Pass 1: meshlet cull-ratio gates
The cull pass shader (meshlet_task.wgsl) was already implemented: fused frustum + cone + Hi-Z occlusion per meshlet. The Hi-Z build pass was real WGSL too. What was missing was a CPU-equivalent test that mirrors the shader byte-for-byte and verifies the rejection rates.
The test is 462 lines. It re-implements sphere_in_frustum, cone_visible, and hiz_occluded in Rust, with the same reverse-Z convention and the same s8-packed cone-axis decoding. Then it runs two scenes through it.
Stadium fixture. 1000 meshlets across a front stand, back stand, side stands, and back-of-camera stand. A synthesised reverse-Z Hi-Z models a depth pre-pass of the front stand. Combined cull rejection: 85 percent (450 frustum + 400 Hi-Z), against a 60 percent gate. Comfortable.
Fully-visible fixture. 400 meshlets with no occluder. Risk-mitigation gate. The cone-cull math could be too aggressive and reject visible meshlets. Measured: 0 false rejections.
Both gates green. The cull is correct. (The cull was correct before. The test is the thing that proves it.)
Pass 2: bindless WGSL adoption (deferred)
BindlessMeshPipeline exists. It binds 4096 textures via binding_array<texture_2d<f32>, 4096>. It has a 778-line production WGSL shader at shaders/scene/mesh_bindless.wgsl covering vertex transform, full PBR shading via the pbr module, cluster-bin lighting, CSM sampling, and IBL.
The pipeline currently loads a 3-line PLACEHOLDER_WGSL that draws degenerate triangles.
The reason: the bindless shader needs 7 bind groups. The pipeline currently builds 3. Extending the layout to 7 (textures+samplers, materials, draws, frame uniforms, lights+cluster, shadows, IBL), plus updating the test/bench fixtures to populate all 7, plus verifying SSIM = 1.0 against the legacy path on real reference hardware, is the bulk of Pass 2’s work. The shader is ready. The pipeline isn’t.
When this lands, the PLACEHOLDER_WGSL constant gets deleted in the same commit as the include_str! swap. There won’t be a transition period where both are valid.
Pass 3: live indirect dispatch (deferred)
dispatch_render3d is the live entry point for Render3D. It currently calls render_3d::execute, which is the legacy per-mesh-bind-group path. The replacement target is execute_scene_buffer (which uses multi_draw_indexed_indirect_count against a draw-store SSBO) on T3 adapters, falling back to the legacy path on non-T3.
The fallback isn’t optional. wgpu’s minimum-spec contract still has plenty of devices that don’t support indirect-count, and a creative coding tool that fails to render on an integrated Intel chip is a creative coding tool that doesn’t ship.
When this lands, the per-mesh make_bg3 HashMap-dedup loop in render_3d::execute (lines 691 to 741) goes away. render_3d::execute itself is deleted in the same commit. CI gate: zero references to MaterialTexHandles in non-test code.
Pass 4: DrawItem becomes an enum (deferred)
lux_core::scene::DrawItem is currently a struct. It has a mesh: MeshHandle field. When you skin a character, the skinning compute shader produces a deformed vertex buffer. The consumer (RenderSceneNode) currently reads the CPU-fallback out: Mesh MeshHandle, ignoring the GPU-side deformed buffer entirely (this is the situation the previous round’s parity test prepared the ground for).
The schema flip:
pub enum DrawItem {
Static(StaticDrawItem),
Skinned(SkinnedDrawItem),
}
SkinnedDrawItem carries the deformed-position SSBO handle plus the previous frame’s deformed positions (for TAA motion vectors). The SKINNED branch of the vertex-buffer-write shader (already authored, unwired) reads the SSBO at vertex fetch.
scene.draws becomes an exhaustive match. The wildcard_enum_match_arm clippy lint denies any catch-all, so adding a future variant (Volumetric, Procedural, whatever) is a compile-error at every consume site, not a silent fallthrough.
Pass 5: skin_cpu deletion (deferred, same commit as Pass 4)
The CPU skinning function lives at app/plugins/scene/lux-scene-character/src/skinning.rs. Production runs it under the skin_cpu_fallback Cargo feature (default-on). When Pass 4 lands and DrawItem::Skinned consumes the GPU-side deformed buffer, the CPU fallback has no readers.
In the same commit:
git rmtheskinning.rsfile (the function, not the node).- Delete the
skin_cpu_fallback = []feature. - Delete the
default = ["skin_cpu_fallback"]line. - Delete the
out: Meshoutput fromSkinnedMeshNode::info()(the GPU consumer readsdeformed_posinstead). - Delete the CPU-fallback branch in
SkinnedMeshNode::process. - Rewrite the existing parity test against in-test analytic ground truth (no dependency on
skin_cpu). - Update the skinning bench to drop the CPU arm.
- CI grep gate: zero references to
skin_cpuin non-test, non-comment code.
The reason this is one commit and not five: the moment one of these lands without the others, every skinned-mesh patch silently breaks. Either it all works or it all doesn’t, and the only way to verify “it all works” is to ship them together.
Pass 6: closeout (deferred)
The doc closeout. The invariants section locks in the new contract: production bindless WGSL is canonical, no skin_cpu_fallback feature, DrawItem is an exhaustive enum. CHANGELOG.md records the breaking changes. The bench baseline JSONs land for the stadium 10K-draw and skinned-character workloads on the reference machine.
This is the post that closes the arc. When it lands, the live mesh path’s last legacy fallback goes away.
The reason I’m posting this now
Status posts feel weird. Most of the blog has been “here’s a feature, here’s how it works, here’s the test.” This one is “here’s a contract, here’s the first piece, here’s what hasn’t shipped yet.”
The reason: the meshlet rewrite is large enough that landing it as a single mega-commit is asking for the kind of regression that a six-month forensic investigation has trouble unwinding. Splitting into six passes with explicit gates between them is the rational way to land it. But that means there’s a window where Pass 1 has shipped and the rest hasn’t, and the codebase reads as “this feature is half there.” Posting the contract makes the intermediate state legible: yes, the placeholder shader is still there, yes, I know, here’s the order it gets removed.
When Pass 6 closes, this post becomes the prelude to the closeout post. Until then, it’s the sign on the construction site explaining what’s behind the fence.
I have no idea what I’m doing or if any of this is right, but it’s fun. Follow along.