ReSTIR + Denoise

If the PBR post was the moment Lux’s renders started looking like 2015, this post is the moment they start looking like 2022.

Global illumination — light bouncing off surfaces and illuminating other surfaces indirectly — is the thing that separates “computer render” from “looks real.” Direct lighting (a light hits a surface, the surface reflects a bit) is easy and cheap. Indirect lighting is where all the subtle realism lives: the warm bounce from a red wall onto a white ceiling, the cool fill from a blue sky into a shadowed corner, the glow of a neon sign tinting everything around it.

Real-time GI was an unsolved problem for two decades. Static methods (lightmaps, baked irradiance probes) handle the easy case; dynamic methods were too expensive for 60 FPS. The last few years finally changed that. ReSTIR (Reservoir-based Spatio-Temporal Importance Resampling) plus modern denoising make it tractable on consumer hardware.

This post is Lux’s implementation.

ReSTIR, briefly

The hard problem in GI sampling: for every pixel, you need to evaluate the contribution of every light in the scene (hundreds or thousands of them), weighted by visibility, BRDF, and distance. Doing this exhaustively is 1000 light samples per pixel per frame, which is ~2 billion light evaluations per frame at 1080p. Not possible in real time on any consumer GPU.

The naïve approximation is Monte Carlo: for each pixel, randomly sample a handful of lights (say, 4) and weight the result by probability. This works and produces unbiased output over many frames, but the per-frame result is extremely noisy — you’re estimating a sum of 1000 terms from 4 samples, and the variance is enormous.

ReSTIR’s insight: reuse samples across pixels and across time. If pixel (x, y) picked a good light sample this frame, pixel (x+1, y) probably wants the same sample. If the same pixel picked a good light sample last frame, and the scene barely moved, this frame probably wants the same sample too. The “reservoir” data structure is a compact way to store a probabilistic-sampling state that can be shared between neighbours (spatial reuse) and between frames (temporal reuse).

The per-pixel state is ~8 bytes: one chosen sample index, one weight, and a sample count. At 1080p that’s ~16.6 MB per ping-pong reservoir; two of them for read/write during the reuse passes. The shader runs:

  1. RIS candidate sampling — pick a small number of candidates from the light list, choose the best by contribution-weighted probability, store in the reservoir.
  2. Temporal reuse — sample the previous frame’s reservoir at the reprojected pixel position, merge with the current reservoir.
  3. Spatial reuse — sample the current frame’s reservoir at a handful of neighbouring pixels, merge into the current pixel’s reservoir.
  4. Shade — evaluate the final chosen sample, produce the pixel’s direct-lighting contribution.

The variance-reduction claim is spectacular. At equal candidate budget, ReSTIR produces roughly 10× lower variance than naïve Monte Carlo, which means the same quality at 1/10th the samples per pixel. At 2 candidates per pixel per frame, the temporal + spatial reuse gives you an effective sample count of ~40-50 across a few frames.

Two passes: direct and indirect

Lux ships two ReSTIR variants:

ReSTIR Direct — for direct lighting from discrete lights (point, directional, area). The reservoir samples from the scene’s light list. This is the version that produces correct shadows and highlights from arbitrary numbers of lights, in real time, without a shadow map per light.

ReSTIR Indirect — for indirect lighting: light bouncing off one surface onto another. The reservoir samples from a probe radiance cache (the surface-probe system) that captures radiance at voxelised scene points. Instead of sampling a light, you’re sampling an incoming-radiance direction.

Both are orchestrated by Rust (restir_direct.rs, restir_indirect.rs) that wraps a preserved WGSL compute shader. The shaders are ~270 LOC each; most of the implementation complexity lives in the Rust side — uniform packing, ping-pong reservoir management, resize handling.

Discipline around hot-path allocation: device.create_* only happens in ::new and ::resize. Every frame is just queue.write_buffer for the uniform and encoder.begin_compute_pass for the dispatches. No per-frame GPU allocation. Same contract as every other GPU pipeline in Lux.

The direct variant uses a discrete 64-light scene for its test harness. The indirect variant samples from a 32³-voxel probe grid. Both validate the ≥ 4× variance reduction claim against uniform Monte Carlo at equal candidate budget — not as dramatic as the paper’s 10×, but the paper’s 10× was under best-case conditions and the Lux gate is conservatively at 4×.

Three denoisers

ReSTIR produces a less-noisy signal than naïve MC, but it’s still noisy. Single-sample direct lighting with 10× variance reduction is still visibly speckly. You need a denoiser on top — a spatial filter that smooths the noisy signal while preserving edges.

Lux ships three algorithms:

SVGF (Spatially-aware Variance-Guided Filter)

The SIGGRAPH 2017 classic. Pure spatial filtering: no temporal history. Runs an A-trous wavelet filter (iterative 5-tap kernel with increasing stride) across the noisy input, weighted by the variance estimate of each pixel’s neighbourhood. Pixels with low variance (already smooth) get less filtering; pixels with high variance (noisy) get more.

The A-trous design lets you reach a large effective filter radius (16-64 pixels) with only 5 taps per iteration. 4 iterations gets you a 128-pixel effective radius at 20 samples total — a box blur with comparable quality would need ~16,000 samples. The iterative stride (1, 2, 4, 8…) also preserves edges because each iteration re-evaluates the edge-preservation weights.

ReBLUR (Recurrent Blurring)

NVIDIA’s ray-tracing denoising algorithm. Uses temporal history aggressively: the previous frame’s denoised output is sampled at the reprojected pixel position and blended with the current frame’s noisy input. Plus a ±3σ clamp against the local neighbourhood variance to reject stale history when the scene changes.

Temporal denoising is much more effective than spatial — averaging across time reduces variance linearly with frame count, while spatial averaging reduces it as the sqrt of neighbourhood size. Four frames of temporal history is roughly equivalent to a 16× spatial filter at matched edge preservation.

The downside: temporal denoising produces ghosts on moving objects. The ±3σ clamp rejects samples that fall outside the plausible range for the current pixel, which catches most ghosts; the rest show up as brief trails behind fast-moving objects. Real-time is a compromise.

Hybrid

Run the variance-estimate pre-pass, then select between SVGF and ReBLUR based on the local variance and motion magnitude. Low variance + low motion = ReBLUR (temporal denoising dominates). High variance + high motion = SVGF (spatial denoising handles the frame-to-frame inconsistency better).

The selection is a per-pixel decision in the shader, so different regions of the same frame can use different algorithms. Static background? ReBLUR. Moving foreground? SVGF. Works particularly well for mixed scenes where a camera move reveals previously-occluded geometry — the revealed region has no temporal history so ReBLUR would produce garbage; SVGF handles it correctly for a frame or two until history builds up.

auto_select(variance, motion_magnitude) is the CPU-side heuristic for the algorithm picker. There’s a CPU reference implementation of the A-trous kernel (cpu_svgf_step) for deterministic variance-reduction verification in tests; the CI gate asserts the filter reduces variance by ≥ 2× on a known noisy input.

The plugin crate

All of this wires up through a new plugin crate: lux-texture-screen-space. It exposes nodes in the texture.* family:

  • GiHybrid — wire diffuse GBuffer + normals + depth; get denoised indirect lighting out.
  • Denoise — wire a noisy texture + variance estimate; get denoised texture out. algorithm pin picks Spatial / Temporal / Hybrid.
  • RestirDirect — wire scene lights + GBuffer; get per-pixel direct lighting out.
  • RestirIndirect — wire surface probes + GBuffer; get per-pixel indirect lighting out.
  • Ibl — glue from IBL to ReSTIR’s secondary-bounce sampling.

Plus a handful of sub-nodes for individual pipeline stages (probe caching, sphere tracing, cascaded SDFs) that users don’t usually have to touch directly but that show up as building blocks if you want to assemble a custom GI pipeline.

The permutation budget

One subtle thing that matters in a production rendering pipeline: shader permutation count. Every different combination of features produces a different compiled pipeline. Too many permutations and the app’s shader cache balloons, startup takes forever, and memory usage gets wild.

I gave ReSTIR + Denoise a combined budget of ≤ 15 permutations. Current count:

  • 2 ReSTIR pipelines (direct, indirect)
  • 4 Denoise pipelines (variance estimate + spatial + temporal + hybrid)
  • Total: 6 of the 15-permutation budget used

Plenty of headroom for the follow-up work. Cascaded SDF variants will probably add 3-4 more. Adaptive sample-count variants another 2. We’ll hit the budget eventually, at which point the budget gets revisited rather than blown through — permutation budgets exist because nobody ever reduces a shader cache voluntarily.

What it feels like

Turn ReSTIR GI on in a scene. The first frame it takes a moment to build reservoirs. The second and third frames, bounced light starts appearing — a red wall tints the ceiling pink, a blue skylight softens the shadows, a neon sign casts coloured light on the floor ten feet away. None of it was there before. All of it was implied by the physics of the scene; the rendering just wasn’t computing it.

Then you toggle the denoiser off. It’s noisy. Very noisy. Thousands of little sparkles across every surface. The denoised version hides them so well that when you turn the denoiser off it’s genuinely startling how much raw noise was there. Turn the denoiser back on and it looks clean again.

This is the moment in any rendering pipeline where you cross from “fast but looks wrong” to “fast and looks right.” Real-time GI is the first thing that makes a scene read as physically lit rather than artistically lit. I’m pretty sure I’ve got the implementation right — it agrees with the reference images in the papers, the variance-reduction gates pass, the hybrid auto-selector picks sensibly — but this is the area where I’m most aware of how much research there is that I haven’t read, and where a real rendering engineer would probably point out 10 things I’d have done differently.

The close of the series

This is post 24 — the last in the “changes since Particles, Finally” arc. Twenty-four posts covering:

  • The HDR pipeline and the framegraph that compose it.
  • Post-FX: bloom, TAA, CAS, film grain.
  • PBR materials, normal maps, image-based lighting.
  • More 3D primitives and more filters.
  • Async readback and the StagingBelt.
  • The onboarding triptych: welcome + wires + tooltips.
  • The graph engine rewrite: nine posts, seven P-gates green, regression gates locked down.
  • Projection mapping.
  • 18-node computer-vision analysis library.
  • Multi-window fleet, HDR toggle, encode queue.
  • PerfGuard, hitch capture, crash sandbox.
  • ReSTIR + denoise (this post).

If you’ve followed along, thank you. Lux has gone from “a 2D vector tool with particles” to “a real-time HDR creative-coding environment with GI and projection mapping” in three months. Writing these posts as the work shipped is what keeps me honest — there’s no faster way to realise you don’t understand something than to try to explain it to a reader.

What’s next? I don’t fully know. There’s a plugin-API stability pass coming that’ll let third parties ship plugins for the first time. The dock scaffolding from the app decomposition post wants its actual docked panels. The mesh-shader path for the 10k-cubes bench hits 100k cubes. There’s a collaboration plan that keeps resurfacing.

All of that’s speculation. The one thing I’m confident about is: the next post will go in the same place this one did, and I’ll try to be as honest about what I’m doing (and what I’m not sure about) as I’ve been trying to be through these twenty-four.


I have no idea what I’m doing or if any of this is right, but it’s fun. Follow along.

← Back to devlog