Boxes, Spheres, and Something to Point at Them

Last post ended with a dark triangle on a slightly-darker background and a promise that Phase 9 would stop being a single hardcoded primitive. This post is that promise.

It is also the post where I go back and fix every shortcut I took on the way to that triangle, because there were a lot of them.

The primitives

Three new plugin crates landed this session. The first is lux-scene-primitives, which adds the meshes you’d expect a 3D tool to start with:

  • Box. Width, height, depth, all Numbers. Emits 24 vertices (so each face can have its own normals) and 36 indices. The canonical six-sided thing.
  • Sphere. Radius, subdivisions (horizontal and vertical). UV-sphere topology: two pole triangles and a grid of quads everywhere else. Higher subdivisions make it rounder and the vertex count go up quadratically, as expected.
  • Plane. Width, height. A flat rectangle in the XY plane facing +Z. Two triangles, four vertices. The background quad.
  • Grid. Width, depth, rows, columns. A subdivided plane in the XZ plane with +Y up, triangulated into rows × cols × 2 triangles. The tessellation is invisible under flat shading but shows up immediately once lighting lands in the next post, which is exactly the point. The ground.

All four are stateful GPU nodes that follow the lifecycle contract from the last post: check is_invalid, re-upload if so, mark_mesh_in_use every frame. The mesh-builder logic lives in a shared lux-core::mesh_builders module so that the plugin and any future authoring tool (export, procedural helpers, code-gen) generate identical geometry from the same helper functions.

Every primitive runs against the mesh pool’s content-addressed cache. A Box node with an unchanged size uploads once on frame one, and every frame after that is a blake3 hash of the same bytes resolving to the same handle, zero work on the GPU side.

The transform stack

lux-scene-transform is the second new plugin, and it’s the one that makes the scene feel spatial. Five nodes, all of them outputting Matrix4:

  • Translate3D. X, Y, Z. Outputs a pure translation matrix.
  • Rotate3D. X, Y, Z in degrees. Builds an Euler-angle rotation matrix. Degrees because radians in the inspector are a hostile interface and I’m not doing that to anyone.
  • Scale3D. X, Y, Z. Pure scale.
  • Transform3D. Two Matrix4 inputs and multiplies them. The composition node.
  • LookAt. Eye, target, up. Builds a view matrix. Useful for cameras, also useful for pointing a spotlight.

Each of these is a frequent-enough operation that they need to allocate exactly nothing per frame on the hot path. More on that in a minute.

The camera

lux-scene-camera is the third new plugin. Two nodes:

  • PerspectiveCamera. fov, aspect, near, far, position, target, up. Builds a view matrix (from position + target + up) and a projection matrix (from fov + aspect + near + far), packs them into a Camera PinValue, and outputs it. Things at the origin are visible; things behind the camera are not; things closer than near get clipped. Real 3D, finally.
  • OrbitControls. Radius, theta, phi, plus mouse drag input. Converts spherical coordinates into a position, then hands that to an internal LookAt. Drag to orbit. Scroll to zoom. If the mouse input is missing (headless render, test harness, CI), it falls back to the static angles from its pins so that reference PNGs stay deterministic.

The demo graph now spins. I will not pretend this was not satisfying.

The catch

Here’s the thing: if I’d stopped there and shipped this post, the title would read well but half the infrastructure underneath would have been quietly broken. I went back through every decision from the last post before closing the session, and the findings filled a document. Five of them were P0 (bugs the user would hit the first time they tried the new nodes), and I want to walk through them, because they’re the reason this post is twice as long as it needed to be.

What the last post got wrong: the silent drop

The first bug is the worst, because it’s the kind that doesn’t throw an error, it just lies.

Connect a Box to a RenderScene. Fine, box shows up. Now connect a Spread of boxes to RenderScene, say a GridTransforms3D feeding three positions. You’d expect three boxes.

You get one.

What was happening: RenderScene was a normal (non-spread-native) node. When the evaluator saw a Spread<Mesh> arriving at its mesh input, it kicked into auto-spread mode and called process() once per element. That part is correct. What wasn’t correct: every single one of those process() calls pushed a Render3D op to the texture engine, and every single one of those ops cleared the color target with LoadOp::Clear before drawing. Mesh zero would render, get wiped by mesh one’s clear, get wiped by mesh two’s clear, and the final frame would contain only the last element of the spread.

The output was plausible. Debugging it meant noticing that a spread of three identical boxes at three different positions rendered as one box at the last position, and thinking “wait, it’s not that only one survived, it’s that only one survived visually.”

The fix: mark RenderScene as .spread_native(), which tells the evaluator “don’t auto-iterate me, I handle spreads myself.” Inside process(), pull the mesh, material, and transform inputs via a collect_spread helper that returns a Vec<T> regardless of whether the input was a single value or a spread. Then walk 0..max(mesh_len, material_len, transform_len) and build one DrawItem per iteration, wrapping the shorter spreads around the longer one (the standard Lux spread wrapping rule). One Render3D op per frame, N draw calls inside it, one clear at the start.

This is the first time in Phase 9 that spread semantics and the 3D pipeline had to agree on something, and they didn’t, and the symptom looked like “only my last draw call is working.” The lesson: any node that takes a spread input and has a side effect that happens before the draw (a clear, a resolve, a barrier) has to be spread-native, because auto-spread mode will run the side effect once per element.

‘Three meshes on a spread, all rendering’

What the last post got wrong: sRGB twice

The last post’s RenderScene was allocating an Rgba8Unorm color target and the unlit.wgsl fragment shader was applying manual pow(color, 1.0/2.2) gamma correction on output. This works in the sense that pixels reach the screen, but it’s wrong in two ways I didn’t notice until I tried to composite the 3D render target with a layer that was already using the hardware sRGB path:

  • Lighting math runs in sRGB space, not linear space, because the values in the target are gamma-encoded. Any future bloom, DOF, or tone mapping sampling this texture will get the wrong brightness relationships.
  • The manual gamma correction gets applied twice the moment someone samples the texture as an sRGB input, because the sampler will decode it again.

The fix is to stop doing gamma by hand. A new TextureFormat::Rgba8UnormSrgb variant, RenderScene allocates its target in that format, and wgpu hardware-encodes linear to sRGB on store. The fragment shader outputs linear values and stops pretending it’s 1998. Any downstream post-FX that samples the texture now gets a proper sRGB to linear decode via the sampler for free, and the lighting math stays in linear space where it belongs.

This is the kind of fix that produces zero visual difference in the current demo graph and will save me a week of debugging the first time I plug in a bloom node.

What the last post got wrong: depth z-fighting

wgpu::CompareFunction::Less was a default I didn’t think about. It means “a fragment passes the depth test if its depth is strictly less than what’s already there.” Which is correct right up until you have two coplanar surfaces, say a box face and a plane at exactly the same Y, and suddenly which one draws on top becomes a function of floating-point rounding and the draw order, and you get shimmering noise along the seam.

Switched to LessEqual. Coplanar surfaces resolve to “last one drawn wins,” which is deterministic, which is what I want. Zero visual change in the demo, one future bug prevented.

What the last post got wrong: per-frame heap allocations

Here’s where it gets hot-path-y.

The transform nodes (Translate3D, Rotate3D, Scale3D, Transform3D, LookAt) were each building a new glam::Mat4 per frame, boxing it, wrapping it in a PinValue::Matrix4, and handing it to ProcessContext::output. The box is a heap allocation. In a scene with 50 transforms, that’s 50 Box::new + 50 Box::drop per frame, at 60fps, forever.

The fix is a new trait method: IntoPinValue::write_into_slot(&self, slot: &mut PinValue). Default impl just calls self.into() and overwrites the slot, which matches the old behaviour. But the glam::Mat4 override does something smarter: it checks whether the slot is already a PinValue::Matrix4, and if so, writes the 64 bytes of matrix data into the existing box in place. No allocation, no deallocation, no refcount work. The slot lives in the output map, which is already pre-populated from the cleaning-house pass, so after frame one nothing in the transform hot path touches the allocator at all.

RenderScene got the same treatment for its cached SceneDesc. Instead of Arc::new(SceneDesc::new()) every frame, it holds a long-lived Arc<SceneDesc> and calls Arc::make_mut each frame. After texture_engine.execute() has consumed the previous frame’s op, the refcount is 1, make_mut returns the existing allocation unchanged, and the node clears the draw list in place and re-fills it. Same pattern as the Arc-wrapped layers from two months ago, applied to scenes.

ProcessContext::texture_ops got preallocated with capacity 8, because profiling showed the first push was taking a growth path on every frame a primitive was created. And the mesh pipeline cache now hoists its shared bind-group layouts out of compile() so a pipeline miss doesn’t re-create descriptors that never change.

Net result: the transform-heavy hot path on a 50-node scene went from “a few dozen KB of allocator churn per frame” to zero.

A few more, quickly

A handful of smaller findings that don’t deserve their own section but are worth listing so future-me knows they happened:

  • Sphere and Grid clamp negative subdivisions in the signed integer domain before casting to u32. Before this fix, a negative value would wrap around to u32::MAX and the allocator would go looking for four billion vertices. This is the second time this year I’ve hit “wrapping cast turns a trivial bug into a crash” and I’m going to start grepping for as u32 preemptively.
  • MeshPool content hash is now length-prefixed. Before, two meshes with positions [a, b, c] and [a, b] followed by [c] in the next array would hash identically if the concatenation happened to line up. It never actually triggered in practice, but forward-compatibility means “assume the adversary is the next version of you.”
  • Pipeline cache compile failures stop retrying. If a shader fails to compile once, cache the failure and return the error immediately next time. Before this fix, a broken shader would re-spend 50ms on every frame trying to compile again. Lesson: “cache success” is half the contract; “cache failure” is the other half.
  • readback_texture_sync uses recv_timeout instead of recv, so a device.poll that silently times out no longer deadlocks the caller forever. Found this by running a stress test that had nothing to do with 3D and noticing that a canceled export was leaving a thread stuck.
  • Grid got “Y is up” pinned in its summary, because the exact axis convention for a ground plane is the kind of thing people forget between sessions and blame the node for.

None of these fix a visible bug in the current demo. All of them fix a visible bug in a demo that hasn’t been built yet.

What it feels like

The demo graph now has a Box → Rotate3D → Transform3D → RenderScene(UnlitMaterial, OrbitControls) chain. I can click and drag the preview and the box rotates. I can scroll and zoom. I can add a second box on a GridTransforms3D spread and see three of them at once without the silent-drop bug eating the first two.

‘Orbit camera on a rotated box’

It still looks like 2013, because everything is unlit flat colour on a flat background. But it’s a 2013 that you can fly a camera around.

Next post is the one that makes it look like something. Materials, lights, a Phong highlight on a sphere, and a bug where the sphere was invisible because I’d wound every triangle backwards and nobody noticed for six hours.

← Back to devlog