// 60fps at 4K — the GPU computes the visible world frame by frame
The GPU render pipeline transforms 3D geometry into 2D pixels through a sequence of programmable and fixed-function stages. Every frame: vertices → triangles → fragments → pixels. The programmer controls the vertex shader and fragment shader; the hardware handles rasterization, depth test, and blending.
Transforms each vertex position from object space through world → view → clip space. Runs once per vertex on the GPU. Outputs gl_Position and varyings.
Fixed-function hardware converts triangles into fragments (potential pixels). Interpolates vertex attributes (UVs, normals) across the triangle surface.
Computes final pixel color. Texture sampling, lighting calculations, procedural effects. Runs once per fragment — the bottleneck for complex scenes.
Depth test rejects occluded fragments. Alpha blending composites transparency. Stencil operations mask regions. Final pixel written to framebuffer.
Modern GPUs are massively parallel processors with thousands of shader cores executing the same program on different data (SIMD). NVIDIA Ampere/Ada, AMD RDNA3, and Apple M-series GPUs differ in microarchitecture but share the same shader execution model.
Each SM (NVIDIA) or CU (AMD) contains 64–128 shader processors. RTX 4090: 16,384 CUDA cores across 128 SMs. Peak: 82.6 TFLOPS FP32.
GPU memory bandwidth is the primary bottleneck for texture-heavy scenes. RTX 4090: 1008 GB/s. Texture caches (L1/L2) reduce bandwidth pressure.
32 threads (NVIDIA warp) or 64 threads (AMD wavefront) execute in lockstep. Divergent branching serializes execution — minimize branching in shaders.
Dedicated BVH traversal hardware (RTX). Handles triangle-ray intersection at hardware speed. Enables real-time reflections, shadows, and GI.
Real-time graphics demands consistent frame delivery. At 60fps, each frame has 16.67ms budget. CPU submits draw calls; GPU executes them in parallel. GPU timing tools (NVIDIA NSight, RenderDoc) reveal which passes consume the most time.
Game logic, animation, physics, draw call submission. CPU sends command buffer to GPU. Goal: ≤8ms for 60fps split between CPU and GPU.
Shadow map rendering, G-buffer pass, lighting pass, post-processing. GPU stalls occur when VRAM bandwidth or shader ALU are saturated.
V-Sync locks frame presentation to display refresh, preventing tearing at cost of latency. NVIDIA G-Sync / AMD FreeSync enable variable refresh rate.
Multi-Draw Indirect (MDI) lets GPU fetch draw calls from VRAM buffer, reducing CPU-GPU synchronization. Enables 10–100× draw call count increase.
Real-time lighting has evolved from per-vertex Gouraud shading to physically-based deferred rendering with ray-traced global illumination. Each technique trades accuracy for performance within the frame budget.
Geometry pass writes material data to G-buffer (albedo, normal, roughness, depth). Lighting pass reads G-buffer — decouples light count from geometry complexity.
Physically-based BRDF. Microfacet model: GGX distribution, Smith geometry function, Fresnel-Schlick approximation. Energy-conserving, artistically predictable.
Software and hardware ray-traced global illumination. Radiance cache for diffuse GI. Screen space for short-range reflections. Lumen proxy meshes for SDF tracing.
Screen Space Ambient Occlusion darkens corners and crevices. Ground Truth AO (UE5) uses temporal accumulation for higher quality at similar cost.
Real-time render engines embed decades of rendering research into accessible pipelines. The gap between real-time and offline rendering has nearly closed — UE5 Nanite + Lumen at 60fps produces imagery indistinguishable from path-traced renders to most viewers.
Nanite virtualized geometry, Lumen GI, Chaos physics. Industry standard for virtual production, games, and architectural visualization.
High Definition Render Pipeline. Deferred rendering, volumetric fog, TAA. Excellent multi-platform targeting. C# scripting with Burst compiler.
Open source. Vulkan backend, Global Illumination via SDFGI and Voxel GI. GDScript + C# + C++. Lightweight, no licensing fees.
Real-time node-based. GLSL shaders, TOP/SOP pipelines, Vulkan backend (TD2023+). The live performance and installation standard.
Procedural techniques generate content algorithmically rather than storing it explicitly. Noise-based terrain, signed distance fields for geometry, procedural textures, and L-system vegetation reduce storage requirements while enabling infinite variation.
Signed Distance Fields represent geometry as distance functions. Ray marching through SDF enables complex implicit surfaces with zero triangles. Used in Unreal Nanite, font rendering, and AO.
General-purpose GPU computation. Particle systems, fluid simulation, physics, and procedural mesh generation all happen on GPU via compute passes.
Perlin, Simplex, Worley, FBM. Basis for terrain, clouds, organic textures. GPU-optimized implementations run millions of noise samples per frame.
DX12 Ultimate / Vulkan feature replacing vertex + geometry shaders. Amplification shader culls meshlets; mesh shader generates geometry. Powers Nanite.
These engineers, researchers, and artists define the field — advancing the science of real-time rendering while making it accessible to creative practitioners.
BSP trees, portal rendering, lightmapping, MegaTexture. Invented much of what modern engines treat as given. Quake's renderer remains a study in ingenuity.
Designed Unreal Engine's architecture. Pushed virtual production. Nanite and Lumen represent his vision of closing the gap with offline rendering.
Authored UE4's PBR shading model (Real Shading in Unreal Engine 4, SIGGRAPH 2013). Lumen GI architecture. TAA temporal anti-aliasing systems.
SDF ray marching, smooth minimum operators, procedural noise. Co-founded Shadertoy. His GLSL work defined creative coding aesthetics for a generation.
OpenGL SuperBible author. GPU-driven rendering advocacy. Persistent buffers, bindless textures — modern OpenGL techniques for reducing CPU overhead.
Physics-based rendering theory. Co-wrote "Physically Based Rendering" with Matt Pharr and Greg Humphreys. Consultant to major game studios.
SDF-based rendering in Dreams (PS4). Traced a path from GPU voxel research to a consumer game with fully sculpted real-time geometry. SIGGRAPH 2015.
3D Tiles specification, glTF format. Real-time geospatial rendering at planetary scale. Making GPU-accelerated earth visualization accessible via WebGL/WebGPU.