The Framegraph

Every render pipeline that gets past a certain complexity converges on the same idea. You stop describing “draw this, then draw that” and start describing a graph of passes, each declaring what it reads and what it writes, and you let a compiler figure out the rest. Or at least that’s the folklore — every framegraph paper I’ve read says a version of this, and every engine I’ve poked at ships a version of it, so I’m going to assume the community is correct and build one.

Lux got to that point this session.

The problem

After the HDR pipeline landed, the shape of the render loop was: mesh pass, tonemap pass, done. Two passes, one pair of dependencies, easy. Nobody needs a framegraph for that.

But the next three posts add bloom (extract, blur, combine), TAA (reproject, accumulate, resolve), CAS (sharpen), and film grain (noise composite). That’s nine more passes, and each one has specific opinions about which textures it reads and writes, which formats it accepts, whether it wants the pre-tonemap HDR buffer or the post-tonemap sRGB buffer, and which passes it has to run after.

Wiring that by hand means writing code that says “allocate texture A, run pass 1 reading the mesh output, run pass 2 reading A, free A, allocate texture B, run pass 3 reading B…” for every permutation of post-FX the user might turn on. Every time you add a pass, you touch the wiring. Every time you change a format, you touch the wiring. Every time you add a conditional (“only run TAA if temporal jitter is on”), the wiring sprouts branches.

That’s the path to a render loop that takes a month to change anything in. Framegraph avoids it.

A new crate

lux-framegraph is a new workspace crate. About 1,700 lines. It depends only on wgpu and log. It does not depend on lux-core or lux-render; instead, lux-render depends on it, and lux-render provides the glue that plugs the framegraph’s transient allocator into the existing TexturePool.

The crate is intentionally API-narrow. Four public types do most of the work:

FrameGraphBuilder      // declare passes
PassBuilder            // .read() / .write() / .create() / .export() / .execute()
CompiledGraph          // the output of builder.compile()
GraphExecutor          // runs a CompiledGraph against a wgpu device

You declare passes in arbitrary order. Each pass names the resources it reads, the resources it writes, and a closure that will actually encode the GPU work. Call compile() and the builder returns a CompiledGraph — topologically sorted, with a resource aliasing plan baked in. Call run() on it and the executor walks the sorted passes, acquires and releases transient textures from the allocator, and calls each pass’s closure with the right bindings.

Half-pass time

The interesting piece is the lifetime analysis that determines when a transient texture can alias another.

Naive answer: two textures can share a slot if their live ranges don’t overlap. Pass P writes resource X; pass Q is the last one to read X. X is live from P through Q. If some other resource Y is live from R through S, and the [P, Q] and [R, S] intervals don’t overlap, X and Y can share a slot.

Sounds right. Doesn’t quite work. Consider two passes that touch two different resources on the same “tick” — one reads X for the last time, another writes Y for the first time. Integer tick indexing says they happen at the same time, which means X and Y are both live at that moment, which means the aliaser keeps them apart. But X is about to die, and Y is just being born; they don’t actually need simultaneous storage.

The fix is half-pass time. Each pass owns two ticks: pass index P gets tick 2P (the read phase) and tick 2P+1 (the write phase). A resource’s live range starts at its first-write’s write tick and ends at its last-read’s read tick. Now the “X dies as Y is born” case becomes X: [..., 2P] and Y: [2P+1, ...], which are disjoint, and the aliaser is free to share the slot.

This is a standard technique from the game-engine world (Frostbite’s framegraph paper introduced it in the public literature; every modern engine does some version of it, or at least their GDC talks say they do). Worth writing down for the record because the first version of the Lux framegraph used integer ticks and wasted slots in exactly the cases above.

Greedy first-fit aliasing

Once you have half-pass lifetimes, the aliasing algorithm is surprisingly simple. Sort all transient resources by first-write ascending. For each resource, find the lowest-indexed slot whose current occupant is dead by the new resource’s first-write tick. If none exists, allocate a new slot. Record the mapping.

It’s the compiler version of the “assign rooms to guests” interval-scheduling problem, and a greedy algorithm is optimal when all resources have the same “cost” (they don’t, because a 4K RGBA16Float texture is not the same as a 512×512 R8, but the framegraph keys aliasing on the (width, height, format) triple, so resources only compete for slots within matching buckets).

For a bloom chain with four transient resources and disjoint lifetimes, the compiler emits exactly one slot. The allocator’s acquire count goes from four to one. The first integration test in the crate verifies this:

#[test]
fn framegraph_aliasing_reduces_allocations() {
    // 4-resource chain, each reads the previous, each writes a new one
    // Expected: 1 slot, 1 acquire call
}

Passes. The bloom migration (last section below) hits the same path.

Exports participate in aliasing

One subtle design choice. When a pass writes a resource that the caller wants to keep (the final color output, the HDR buffer, the depth buffer), that resource gets .export()’d. Exports are the outputs of the graph; the executor holds onto their slots past end-of-graph so the caller can retrieve them.

An export could have had its own dedicated slot, carved out of the pool and never aliased. That would be simple and slightly wasteful: an export that’s first-written in the last pass of the graph would have no other lifetime to conflict with, and could trivially reuse a slot that a transient just released. Making exports participate in aliasing saves those slots.

It does mean the executor has to be careful. The slot that the alias plan assigned to an export has to be held, not released, at end-of-graph. The compiler records a export_binding(export_handle) -> slot_index mapping, and the executor consults it to pull the final textures back out.

This caught a bug that had been latent in the executor for about a week: exports were being double-allocated, because the executor was still using the old “exports always get their own slot” code path while the compiler had moved to the aliased model. The integration test that caught it (framegraph_pass_actually_writes_expected_pixels) runs a real fragment shader through the graph, reads back the output, and pixel-checks (R=255, G≈63, B=0) for a shader that writes vec4(1.0, 0.25, 0.0, 1.0). If the executor gives you the wrong slot, the pixel check fails. It did; I fixed it.

The TexturePool bridge

The framegraph doesn’t own texture allocation. It delegates to a TransientAllocator trait:

trait TransientAllocator {
    fn acquire(&mut self, desc: &ResourceDesc) -> wgpu::Texture;
    fn release(&mut self, desc: &ResourceDesc, tex: wgpu::Texture);
}

lux-render provides the implementation: FrameGraphTextureAllocator wraps the existing TexturePool. acquire goes to the pool’s free list; release returns the texture to the same free list. The framegraph gets cross-frame pooling for free, and the pool gets a new client without knowing the framegraph exists.

Two small helpers were added to TexturePool for this: take_free_texture and put_free_texture. Both are public but intended for the bridge only — the regular allocation path still goes through alloc_texture / free_texture. The helpers let the bridge talk to the free list directly without duplicating its bookkeeping.

Bloom is the first consumer

The real test of a framegraph is whether migrating a real pass to it produces equivalent output. Bloom was the obvious candidate: three passes (extract → blur → combine), three transient resources, a clear dependency chain, and an existing pixel-level reference image from the extended texture processing post.

The migration replaces three independent TextureOp::RunShader dispatches with a single TextureOp::RunFrameGraphShaderChain that carries the three passes, their bindings, and their execute closures. The compiler aliases the extract output and the blur intermediate into a single slot (they’re never live at the same time). The executor runs the three passes in one command encoder with the right barriers between them.

And the equivalence test checks that the pixels match the old bloom’s reference image bit-for-bit. They do.

Three dispatches became one. One slot replaced two. No visual change. That’s exactly what you want from a framegraph migration — invisible in the output, dramatic in the structure.

What this unblocks

Every post-FX pass from here on will be a framegraph pass. TAA’s history buffer, the temporal accumulation target, and the resolve output are three lifetimes the compiler can alias. CAS is a simple read-write pass that slots between tonemap and display. Film grain is another. And the real payoff comes later, when the scene-GI work needs to pipe six or seven transient buffers through a denoiser, and the framegraph aliases them down to two.

The HDR pipeline gave me the range I needed. The framegraph gives me the composability. Next post turns both loose on the post-FX stack.


I have no idea what I’m doing or if any of this is right, but it’s fun. Follow along.

← Back to devlog