Kornia-Class Image Analysis

Creative coding environments are usually great at making images and not very good at analysing them. You can generate fractals for days; you cannot ask “what are the edges of this image” and expect a useful answer. For most patches that’s fine — the output is the point, and you don’t need to analyse what you just generated.

For real work it’s limiting. Interactive installations often want to respond to video input: detect edges, find contours, track motion, measure similarity to a reference. Computer-vision tools (OpenCV, Kornia) have entire libraries for this; visual programming tools mostly don’t. The gap is annoying.

This post closes the gap. Sixteen new nodes plus two migrated legacy ones, all in a new lux-texture-analyze crate. Every operator is a GPU shader that consumes TextureHandle and produces TextureHandle — native citizens in the texture pipeline.

The categories

Six families, each answering a different kind of question about an image.

Inspect (2 nodes, both migrated)

TextureSize returns an image’s dimensions. Sample reads the pixel value at a specific UV coordinate. Both were already in the codebase from the texture analysis post; this session migrated them onto the new #[lux_node] macro and consolidated their documentation alongside the new nodes.

Edge detection (3 nodes)

Sobel computes per-pixel gradients with the 3×3 Sobel kernel and outputs gradient magnitude. The classic edge detector, unchanged since 1968, still the thing you reach for first.

Laplacian is the second-derivative counterpart. Instead of “where is the image changing” it’s “where is the image changing how it’s changing.” Sharp edges on flat backgrounds get the same strong response as Sobel, but Laplacian is more sensitive to texture and noise. Useful when you want to detect small-scale structure rather than silhouettes.

Canny is the one everyone actually wants. Four passes:

Gaussian blur (noise reduction; Canny without it is unusable on real photography).
Sobel gradient.
Non-maximum suppression (thins edges to one-pixel width).
Hysteresis thresholding (high/low dual threshold; weak edges survive only if connected to strong ones).

Canny’s four-pass pipeline is exactly the kind of thing the framegraph from a few posts back was built for. The three intermediate textures alias into a single slot thanks to the aliasing pass, so a Canny node only holds one transient texture’s worth of extra memory on top of the output.

Sobel produces |G| = 4 exactly on the canonical step-edge input image; that’s the sanity test I use to confirm the shader is doing what it claims. Canny’s hysteresis is tested with three seeded cases: strong edge, weak-connected-to-strong, isolated weak (which should be dropped). All three behave correctly.

Morphology (4 nodes)

Erode, Dilate, Open, Close. The four classical morphological operators, all from one parametric kernel shader that takes a kernel_size (diameter) and an operator enum.

Erode shrinks bright regions by taking the minimum over a kernel neighbourhood. Dilate grows them by taking the maximum. Open is erode-then-dilate (removes small speckle noise without changing feature sizes). Close is dilate-then-erode (fills small holes in otherwise-solid regions).

These get used in computer vision for cleaning up masks. Threshold an image to produce a binary mask; morphological open removes noise pixels; close fills in gaps in the foreground. It’s the boring-but-essential pipeline step that bridges raw thresholding and contour analysis.

Tests: dilate of a single bright pixel grows it to a kernel-sized square. Erode of a bright square shrinks it. Open of a bright square with one-pixel speckle noise removes the noise. Close of a bright square with a one-pixel hole fills the hole. All four behaviours confirmed against hand-rolled reference images.

Histogram (2 nodes)

Histogram computes a 256-bin histogram of an image’s luminance. Output: an RGBA texture where R = count, G = density, B = cumulative, A = CDF. One node, four different views of the same distribution.

Implementation-wise, histograms on the GPU are annoying because histogram is fundamentally a scatter operation, and scatter-safe atomics on Rgba16Float storage textures aren’t universally supported. I took a shortcut: a 128×128 sampling probe computed in a fragment pass, then a 256-bin gather pass that accumulates counts in a compute shader with buffer-backed atomics (using the compute-buffers cache infrastructure). 16K samples per call, which is plenty for the histogram shapes that matter.

HistogramEqualize applies the CDF from Histogram to remap an input image’s luminance so the histogram becomes approximately uniform. Photographic auto-level. Useful as a preprocessing step for any analysis that wants consistent input contrast.

Bin sum equals total pixel count; CDF is monotonic non-decreasing. Both verified.

Metrics (4 nodes)

SSIM (Structural Similarity), PSNR (Peak Signal-to-Noise Ratio), MSE (Mean Squared Error), MAE (Mean Absolute Error). Image-similarity metrics. Given two input textures, each of these produces an error map (per-pixel comparison) and a scalar score.

The scalar score is the tricky part. Computing it requires reducing a per-pixel result to a single number, which requires a GPU→CPU readback that would be synchronous, which violates the async readback contract. So the scalar output is currently a placeholder that will fill in once the async readback lands for compute-shader outputs.

The error maps work now. They’re the useful part anyway — “which region of my output differs most from my reference” is more actionable than “my output has SSIM 0.87 against my reference,” and you can look at the error map in the preview and immediately see where the discrepancy is.

SSIM identity (same image, same image) = 1.00 exactly. Slight perturbation (one pixel shifted by 10% of the range) > 0.9. Both verified.

Frequency (2 nodes)

FFT2D and IFFT2D. Two-dimensional Fourier transforms.

Frequency-domain operations are weirdly useful in creative coding. You can do bandpass filtering (keep only certain spatial frequencies) to produce interesting painterly effects. You can multiply two frequency-domain images for convolution-by-FFT, which is much faster than spatial convolution at large kernel sizes. You can visualise the frequency content of an image directly as a way to see its texture character.

For N ≤ 256 (the typical case for creative work), I’m using a direct DFT in a single compute-shader pass. Yes, DFT is O(N²) per row/column; a full FFT would be O(N log N). For N=256 the difference is 256 vs ~45 operations per bin. It adds up, but the constant factor of a direct implementation is much lower (no butterfly graph bookkeeping), and in practice the DFT is competitive up to about N=512.

For larger images, this will need a proper FFT. Probably in the form of a Stockham-variant compute-shader pipeline. Not in this post.

The round-trip gate is the important test: FFT followed by IFFT should recover the input exactly (up to floating-point error). Impulse, sinusoid, and ramp inputs all round-trip with MSE < 1e-6 at N=32. If you want to use these to do frequency-domain filtering, the round-trip fidelity has to be there or the filtered output has ghost artifacts that aren’t part of your filter.

Motion (1 node)

OpticalFlowLK. Single-level Lucas-Kanade optical flow, which means: given two frames, estimate per-pixel velocity vectors.

Lucas-Kanade is the classical approach — assume the flow is locally constant within a small window (5×5 here), solve a 2×2 linear system per pixel for the velocity vector that minimises the brightness-constancy error. The shader does this for every output pixel in one pass.

Single-level means it only resolves motion of up to about 2 pixels per frame. Real motion (a hand moving in front of a webcam) is often faster than that. A proper pyramidal LK (coarse-to-fine across image pyramids) handles much larger motions at the cost of extra passes. That’s a follow-up.

Verification: a synthetic 1-pixel-per-frame pan recovers u≈1, v=0 within ~5% on the single-level version. For faster motion, the estimate saturates — which is correct behaviour for the algorithm’s assumptions. For a webcam installation tracking hand motion, single-level is often enough; for sports footage or fast camera pans, the pyramid version will need to land.

Everything through `#[lux_node]`

This was the first plugin crate written entirely through the new #[lux_node] macro. Every node declares its pins in the macro arguments; the macro emits the PinId constants, the info() body, and the registration boilerplate. The PinId post introduced the macro; this crate is where I confirmed that writing a new 18-node plugin with the new tooling feels good.

It does. The per-node boilerplate dropped from about 40 lines (with hand-written info() and string-indexed pin access) to about 15 lines (with the macro handling pin declarations and the ctx.input_pin(Self::PIN_FOO) calls). Across 18 nodes, that’s ~450 fewer lines of mechanical boilerplate and 0 new instances of the “typo in a pin name” bug.

What this enables

A handful of patches I wanted to build but couldn’t:

Edge-driven generative art: webcam → Canny → feed the edge map into an SDF graph that extrudes each edge as a 3D ridge.
Automatic contrast matching: reference image → Histogram → apply the CDF to live video so your projection matches the room’s lighting.
Similarity-gated triggering: two cameras → SSIM → when similarity drops below a threshold, fire an event. Installation-grade motion detection.
Frequency-filtered distortion: FFT → bandpass mask → IFFT → get artistic textures that isolate specific spatial scales.
Flow-driven particles: OpticalFlowLK on webcam → feed the flow field into a particle emitter as initial velocity → particles that follow real movement.

I’ve only prototyped the last one so far. It works. A webcam pointed at a hand waving produces a flow field that drives a particle fountain; wave your hand faster and the particles accelerate; hold still and they fall under gravity. This is the kind of patch that’s genuinely hard to build without these nodes and straightforward once they’re in the toolbox.

There’s a fundamental research question I’m not qualified to answer about what a visual-programming-native computer-vision library should look like. Kornia is fantastic for tensor-flow-style pipelines; OpenCV is fantastic for imperative scripts; neither is great as a node graph. I’m pretty sure Lux’s version is closer to right than either, but I’m also aware that I’m shipping the operators people complain about missing and hoping the set converges to something that feels complete.

I have no idea what I’m doing or if any of this is right, but it’s fun. Follow along.