Devlog

The Orphan a Deletion Leaves

Deleting the dead deferred renderer left a ghost behind: a GPU classifier whose every caller had just been deleted. This is the short one about chasing a deletion all the way to zero, and about a cost you pay without seeing it. Every line of 'keep it, might be useful later' is load time, memory, and a place for bugs to hide. The engine you run should be exactly the engine that is alive.

The Shading Path That Never Shaded

Lux had two ways to light a surface, and only one of them was real. The other shaded everything magenta and never drew a frame. Two lighting paths means two ways for your materials to look wrong, so this post deletes the dead one. Now there is exactly one place where light meets a surface, which means every tonemap tweak, every roughness fix, every colour improvement lands in every scene at once. Your gold looks like gold everywhere.

The 800-Line Ceiling

You will never see this commit, and that is the point. The god-file carve is finished, and a CI gate now stops any engine file from crossing 800 lines again. Why it matters to you: a codebase that does not rot is the difference between a tool that gets better every week and one that slows to a crawl, and small, focused files are what keep the per-frame hot path allocation-free when your patch gets big.

Carving the Core

A rewrite the size of the bindless mesh path leaves the codebase a little god-shaped, and a god-shaped codebase is how a tool that demos beautifully starts hitching in your hands. This is the cleanup that keeps that from happening: the Render3D framegraph collapsed into one graph (so your shadows actually render and the async-compute overlap actually fires), thirty-plus per-frame allocations routed through the bind-group cache, and a 3,466-line bridge file carved into eight, with a workspace-wide sweep behind it.

Deleting the Legacy Renderer

Lux had two mesh renderers: a fast bindless one that drew static geometry, and a 1,466-line legacy function that did everything else (shadows, skinned characters, instances). Two renderers is a slow fork in every frame and two ways for your scene to drift. This post teaches the fast path everything the slow one knew, deletes the slow one, and then solves the harder problem the deletion forces: with no old renderer to compare against, correctness stops meaning 'matches last quarter' and starts meaning 'matches physics', which is the ruler that actually protects what you see.

The Bindless Arm Goes Live

For most of Lux's life the renderer announced every object to the GPU one at a time, which is fine until a scene has a few thousand of them and the CPU falls over. The bindless arm draws the entire scene in a single GPU-driven indirect call and lets the GPU decide what is visible, which is the road to dense scenes at framerate. This post loads the real 778-line shader, stands up the orchestrator that feeds it the whole scene, caches its seven bind groups, and flips the live arm on capable hardware, keeping the old renderer beside it on purpose so the new one can be proven correct before anything is deleted.

The Skinning Fraud, Closed

Built, Not Adopted admitted the GPU skinning compute shader was complete, tested, and dispatching, and that the live renderer ignored all of it and ran a CPU loop instead. This post closes that. DrawItem becomes an enum, the renderer consumes the GPU-deformed buffer, and skin_cpu (function, Cargo feature, and all) is deleted.

The Train Rides the Rails

The framegraph post shipped queue-affinity routing and then admitted the live orchestrator still ran everything on one encoder. Rails, not the train. This post flips it: a graphics and compute encoder pair per frame, run_with_dispatch on the live path, and the two 1x1 sentinel textures that were impersonating GPU buffers finally deleted.

The Debt Sweep

Built, Not Adopted was the wiring half of the cleanup. This is the deletion half. A #[deprecated] Material shim, an 845-line version-migration module, a legacy Cargo feature, and the last hand-rolled fallback in the node registry, all gone. Plus the #[lux_node] macro migration finishing across 233 nodes in 42 plugin crates.

The Meshlet Path, So Far

The bindless mesh shader is 778 carefully-authored lines of WGSL that the pipeline doesn't load yet, because the pipeline still loads a 3-line placeholder that draws degenerate triangles. Effective. Pass 1 of the meshlet rewrite landed (cull-ratio gates: 85 percent on a stadium, 0 false rejections on a fully-visible scene). Passes 2 through 6 are queued. This post is the status dispatch.

Shadow Orchestrator + 1024-Slot Lights

The live shadow path was a hardcoded ±10 ortho, an 0.005 magic-number bias, an atlas the size of someone's first attempt, and an 8-slot light uniform that silently dropped lights past slot 7. Worse: every area light and IES light was packing as LIGHT_TYPE_NONE = 0 and getting dropped by the shader's guard before it ever reached the loop. Now there's an orchestrator with cascade-snap stability, a 4-cascade PSSM, a 1024-slot LightStore, and a cluster-bin compute pass that puts the lights in tiles instead of a fixed-size uniform.

Every Shade Path Imports lux::pbr

I wrote a careful PBR module months ago and the live mesh shader was bypassing it the whole time. A chrome sphere at roughness 0.5 was losing 12 to 18 percent of its incident energy. Plus: the BRDF LUT going from 2 channels to 4, the legacy ToneMap plugin getting deleted, every plugin shader flipping from Rec.601 to Rec.709, and ColorGrade3DLUT going from `return src` to actually grading.

ResourceKind Becomes a Closed Set

The framegraph shipped a year ago with one resource variant and a comment that said the others were coming. They were not coming. They were waiting. This post is them arriving (Texture2D, Depth, Cube, Texture3D, TextureArray, Msaa, Buffer), plus the queue-affinity tags that make compute dispatch routable, the cross-queue barrier that goes on the destination encoder, and the 2,688-line god-file that finally got carved up.

Built, Not Adopted

A pile of well-tested library code that the app wasn't actually calling. BindGroupCache was used by 1 of 34 sites. The MasterClock was constructed zero times. The welcome modal had eleven cards and one of them worked. Twenty-two items of cleanup that close the gap between 'shipped' and 'wired.'

ReSTIR + Denoise

Real-time global illumination finally lands. ReSTIR direct + indirect for sampling a scene lit by many lights. Three denoisers (SVGF, ReBLUR, Hybrid) for cleaning up the noisy result. A new texture-screen-space plugin crate that wires all of it together as nodes.

PerfGuard, Hitch Capture, Crash Sandbox

When something goes wrong mid-performance, the show cannot stop. An eight-step fallback ladder that degrades render quality to stay on budget. A 32-slot hitch-capture ring for post-mortem analysis. A crash sandbox that catches panicking nodes, dead devices, and malformed shaders without taking the app down.

Multi-Window, HDR Toggle, Encode Queue

Three new systems for the live-performance side of Lux. A window fleet that manages up to 16 output surfaces with a 50% VRAM cap. A live HDR toggle that survives monitor hot-plug. A dedicated encoder thread that records while you perform without stalling the render.

Kornia-Class Image Analysis

Sixteen new analysis nodes plus two migrated ones: Sobel, Laplacian, Canny, morphological ops, histograms, SSIM/PSNR/MSE/MAE, FFT and IFFT, Lucas-Kanade optical flow. The computer-vision toolbox Lux didn't have, now as GPU-accelerated node-graph citizens.

Projection Mapping

Three new nodes that live between your output and the wall you're projecting on. Corner-pin warp with a DLT homography solver. A mesh warp driven by an editable bezier grid. Edge blending for projector stacks. Plus the canvas gizmos that make all three clickable.

The Last Few Allocations

Three final perf passes on the graph engine. Wire transfer that skips the gen-bump when the value didn't change. A per-target connection index so wire lookups are O(indegree) instead of O(edges). A processed-set that short-circuits predecessors already visited this frame. Together: 886 µs → under 10 µs on the unchanged-subtree bench.

Bit-Identical Goldens + iai-callgrind

The rewrite is allowed to be fast. It is not allowed to change a single pixel. A stricter-than-family golden test suite locks down every pre-rewrite test patch to SSIM ≥ 0.9999 and max 1-LSB diff. Plus iai-callgrind companion benches to put CPU-instruction-count regression gates in CI.

Spread Becomes Arc<[T]>

A 1000-element spread used to deep-clone every time it crossed a wire. 50 hops through a graph meant 50,000 clones per frame. Replacing Spread(Vec<PinValue>) with Spread(Arc<[PinValue]>) collapses the wire-hop cost to one atomic refcount bump — a 420× speedup on the benchmark that matters most.

Parallel Evaluation with Rayon

The evaluator was running on one core. The graph is a DAG, which means huge chunks of it can run in parallel. A level partition + rayon per-level par_iter and the 1000-node fan-out benchmark drops to half its previous cost.

Pearce-Kelly Dynamic Topological Sort

Every wire edit used to re-sort the entire graph from scratch. 180 µs per connect at 1000 nodes. Now the common case is O(1) and the worst case is O(|δ|) — the size of the affected region, not the size of the graph.

Breaking Up LuxApp

LuxApp was 2,200 lines of everything talking to everything. Six refactor commits later it's seven modules with single responsibilities, a unified EditorAction queue, and an egui_dock scaffold where floating panels used to be.

PinId: The Death of HashMap<String, _>

Every pin lookup used to rehash its name. After this post, every pin lookup is an array index. A 16-bit PinId, a proc-macro that generates constants per node, a ProcessContext overload, and 0 string allocations per frame at 1000 nodes.

SlotMap + Generation Counters

The first landing of the graph engine rewrite. Seven parallel HashMaps collapse into one packed NodeSlot per node, keyed by a DenseSlotMap. Dirty checking moves from deep-compare to per-pin generation counters.

The Graph Engine Rewrite: Why

The uncomfortable part of shipping a fast renderer is discovering that none of your benchmarks measured the actual frame. A baseline audit, a list of structural problems, and the set of P-gates the rewrite has to clear.

Tooltips, Error Dots, and the F8 Profiler

Pin tooltips that show defaults and ranges. A red error dot on any node that panicked during evaluation. An F8 profiler HUD that makes frame time a number instead of a vibe. Plus search that indexes summaries and tags.

Wires That Help You

Magnet-snap to the nearest compatible pin. Drop a wire into empty canvas and get a filtered, pin-type-aware search. Backward wires that route around their source node instead of through it. Plus one central place where every editor animation lives.

The First 60 Seconds

A welcome splash, a Mouse-to-Circle sample patch, a menu bar, a help cheatsheet, an empty-canvas CTA, and a preferences file. The moves that close the gap between 'installs' and 'starts using it'.

Async Readback and the StagingBelt

Two render-layer perf fixes that each closed a sync stall on the hot path. The GPU readback loop stops blocking on device.poll; uniform writes stop going through individual queue.write_buffer calls. Net: 3 ms of frame time I didn't know I was leaking.

More 3D, More Filters

The cleanup post. Three new 3D primitives (Cylinder, Cone, Torus), a Twist3D mesh deformer, a SRT gizmo, plus five small texture nodes that shipped alongside the big render rewrite.

Image-Based Lighting

A sphere lit only by an environment map. Four compute passes — environment cubemap, irradiance, prefiltered mip chain, BRDF LUT — plus the split-sum sampling in the PBR shader that makes it all converge.

Cook-Torrance PBR + Normal Maps

A real physically-based shading model lands. Metallic/roughness workflow, GGX microfacets, Schlick Fresnel, normal maps with per-vertex tangents, and one SceneObject node that bundles geometry and material.

Post-FX: Bloom, TAA, CAS, and Film Grain

Four post-process passes, each opt-in per scene, each written against the new HDR buffer and the new framegraph. Jimenez bloom, depth-only reprojection TAA, AMD FidelityFX CAS, and a compute-hash grain pass.

The Framegraph

A new crate that owns the compile-time phase of the Lux render pipeline. Passes declare what they read and write, the compiler aliases transient textures, and bloom goes from three independent dispatches to one dependency-ordered chain.

Everything HDR

A new default texture format, a dedicated tonemap pass at the edge of the scene, and a pipeline that finally stops crushing every value above 1.0.

Particles, Finally

600 particles under gravity, an Emitter2D monolith with fourteen input pins, and a new --warmup CLI flag because headless snapshots of stateful nodes were broken in a way I hadn't noticed.

SDFs Without Writing Shaders

A graph of SDF nodes stitched into a single WGSL fragment shader at evaluation time. Drop a circle, drop a box, drop a smooth union, see the result raymarched to the screen. No text editor involved.

Prism

The idea at the heart of Lux, and the reason I started building it in the first place. Shaders are already graphs of smaller functions. So why are we still writing them as text?

Compute Shaders Get Real Buffers

A new compute pipeline cache that can read and write N storage buffers at arbitrary bindings. No visible output, but every GPU-side node from here on depends on it.

10,000 Cubes in One Draw Call

GPU instancing, a real BufferPool behind it, content-hashed transform uploads, and the stale-handle bug that took the whole thing down until I figured out which node had to own the keep-alive.

Boxes, Spheres, and Something to Point at Them

The scene you can actually build. Box, Sphere, Plane, Grid, a transform stack, a perspective camera, and orbit controls. Then the cleanup pass that rolled back half the decisions from the last post.

Lux Goes 3D

Phase 9 starts here. A new render_3d pass, a mesh pool with content-addressed upload, three new pin types, and a plugin that does exactly one thing: draw a triangle.

The Trust Pass: Autosave, Async, and a Progress Bar

Lux used to freeze when you loaded an image, freeze when you exported a sequence, and never tell you whether you had unsaved changes. This session fixed all of that, plus moved SolidColor and NoiseTexture fully onto the GPU.

Zero GPU Allocations Per Frame

The texture engine was allocating GPU resources every frame. Now it isn't. Plus a stack-allocated path for wire curves, a single command encoder per frame, and 16 tests that were overdue.

Merge, Multi-Wire, and the Death of Group Chains

One node that takes any number of layers. A graph engine that accepts multiple wires into a single Spread input. Drag-to-reorder layer ordering in the inspector. The ugly Group-chain pattern is finally gone.