Stop Cloning Everything

I spent a session reading the hot paths in Lux and asking each one the same question: “do you really need to own that data, or are you just being greedy?”

The answer was usually greedy.

The Layer Problem

Every shape node in Lux produces a PinValue::Layer — a struct containing all the draw commands that describe the visual output. When that value crosses a wire, it gets cloned. When it goes through auto-spread, it gets cloned. When coerce() touches it, it gets cloned. Layers contain vectors of draw commands. Each clone means allocating a new vector, copying every draw command, and eventually dropping the old one.

For a patch with a Circle connected to a Translate connected to a Scale connected to a Group — a completely normal chain — that’s four deep copies of the same draw commands per frame. For sixty frames per second. For every shape in the patch.

The geometry doesn’t change when it crosses a wire. The data is immutable. We’re copying it because the type system says Clone means “give me my own copy” and we never stopped to ask if we actually needed one.

The fix: wrap LayerData in Arc. Cloning an Arc is an atomic refcount bump — a single CPU instruction on x86. O(1) regardless of how many draw commands are in the layer.

// Before: full deep copy
PinValue::Layer(layer_data)

// After: atomic refcount bump
PinValue::Layer(Arc<LayerData>)

Ten files changed. Every shape transform node needed updating to Arc::unwrap_or_clone() when it actually needs to modify the layer. But the common case — passing a layer through a wire without modifying it — is now essentially free.

The HashMap Tax

Every frame, the evaluator creates a fresh HashMap<String, PinValue> for each node’s outputs. Fresh HashMap. Fresh String keys. For every node. Every frame. Sixty times a second.

If you have 100 nodes with an average of 2 outputs each, that’s 200 String allocations per frame just for the keys. Not the values — the keys. The strings that say “out” and “layer” and “result” — the same strings, every frame, forever.

The fix: pre-populate the output map once using pin names from NodeInfo, then reuse it. ProcessContext::output() now uses get_mut() to update existing entries in-place instead of inserting new ones. After the first frame, the output path does zero String allocations.

The auto-spread path was trickier — it needs to reset output values between spread slices. But instead of dropping and re-inserting keys, it now overwrites the values while keeping the keys alive.

Zero-alloc after warmup. The HashMap was always going to be there (we need the lookup), but the keys don’t need to be born and die sixty times a second.

The Spread Tax

This one was hiding in plain sight. Every spread operation — Take, Skip, Tail, Filter, Distinct — was cloning the entire input spread before operating on it.

// Before: clone 10,000 elements, keep 5
let items = match ctx.input_raw("spread") {
    Some(PinValue::Spread(items)) => items.clone(),  // <- this
    _ => vec![],
};
let taken: Vec<PinValue> = items.into_iter().take(5).collect();

If you have a particle system with 10,000 particles and you Filter it down to 100, you were cloning 10,000 elements to keep 100. Every frame.

The fix: borrow the input as a &[PinValue] slice, then only clone the elements that survive.

// After: borrow 10,000, clone 100
let items = match ctx.input_raw("spread") {
    Some(PinValue::Spread(items)) => items.as_slice(),
    _ => empty.as_slice(),
};
let taken: Vec<PinValue> = items.iter().take(5).cloned().collect();

Five nodes got this treatment: Take, Skip, Tail, Filter, and Distinct. Nodes that need to mutate the data (Sort, Reverse, Shuffle) still clone — they have to. But subset operations that just pick elements from a larger collection? Borrow and clone only what you keep.

For the 10K→100 case, that’s a 99% reduction in clones per frame.

The Wire Drag Surprise

When you drag a wire from an output pin, compatible input pins on other nodes pulse with a subtle animation to show where you can connect. Pretty. Helpful. And absolutely devastating to performance.

The pulse state was stored in the node render cache. Which meant that during a wire drag, every node with a compatible pin had its cache invalidated. Every frame. For the entire duration of the drag.

Dragging a wire across a patch with 50 nodes? That’s 30+ node render cache rebuilds per frame. The nodes themselves haven’t changed — their geometry, colours, and pin layout are identical. But the cache system didn’t know that because “is a pin pulsing?” was part of the cache key.

The fix: render the pulse as a post-cache overlay. The node render cache handles the node’s actual appearance. The pulse circles get drawn on top, separately, only during wire drags. Cache stays valid. Wire dragging stopped being the most expensive operation in the editor.

The Lesson

All four of these changes share the same root cause: copying data that doesn’t need to be copied. Arc wraps shared immutable data. Pre-populated maps reuse keys. Slice borrows avoid cloning inputs. Overlays avoid invalidating caches.

None of this was wrong in the functional sense — the output was correct. But at 60fps with hundreds of nodes, “correct but wasteful” and “too slow” converge pretty quickly.

The best part? All the existing tests still pass unchanged. Same inputs, same outputs. Just fewer copies getting born and dying in between.

← Back to devlog