App 06 · Video Universe · WAVGEN

AI VIDEO WORLD

The Frontier of the Moving Image

The image that never existed. The frame that cannot exist again.

Diffusion Steps Latent Space Guidance Scale Seed
THE DIFFUSION ENGINE
STEP (0–50) 0

The diffusion process starts with pure Gaussian noise and iteratively removes it, guided by a learned distribution. Each denoising step refines the latent representation toward a coherent image. The latent space (left) is the compressed abstract manifold where generation occurs.


Section 02

Prompt Lab

Professional AI video prompts are structured taxonomies — not sentences. Build one below.

Subject
a lone figure a burning city a chrome robot
Action
walking slowly dissolving into light rotating
Style / Reference
cinematic, 35mm film grain Tarkovsky slow cinema Studio Ghibli watercolor
Camera / Lighting
dolly shot golden hour extreme close-up
COMBINED PROMPT OUTPUT
VAGUE PROMPT
DETAILED PROMPT

AI video generation responds to specificity. "A man walking" produces generic results. "35mm, slow dolly, golden hour, lone silhouette, Terrence Malick" produces intention. Every field in your prompt is a constraint that narrows the generation space toward your vision.


Section 03

The Coherence Problem

AI video must produce 24 consistent frames per second. Each frame must agree with every other frame about who, what, and where.

6-FRAME SEQUENCE
COHERENCE 100

Approaches to the Coherence Problem

① Video-to-Video: Inherit Coherence from Source +
Start from a real video clip and use AI to restyle it — adding artistic filters, changing the visual style, or altering surface details. Since the source video already handles temporal consistency (real physics, real camera), the AI only needs to maintain style coherence, not structural coherence. Used in tools like Runway ML's Video-to-Video and EbSynth.
② Conditioning Frames: Keyframe Interpolation +
Provide explicit keyframes (start frame, middle frame, end frame) and let the AI interpolate the motion between them. The model is constrained to begin and end at your specified images, dramatically reducing drift. Used by Pika Labs and Stable Video Diffusion's keyframe interpolation modes.
③ ControlNet / Pose Guidance: Structural Constraints +
Extract depth maps, pose skeletons, or edge maps from each frame and use these as hard constraints during generation. The AI must produce an image that matches the structural skeleton, even as it varies surface details. Pose-guided generation ensures a character's body proportions remain consistent across frames even when appearance changes.

The coherence problem is the central unsolved challenge of AI video. An image generator produces one frame. A video generator must produce 24 per second — all consistent with each other about who the character is, where the light sources are, and what just happened in the previous frame.


Section 04

Deepfake Detection Clinic

Knowing how deepfakes work is essential for media literacy. This section teaches detection, not creation.

Detection Checklist

  • 01Blinking — AI faces often don't blink naturally; timing is off or absent
  • 02Hair edges — strands blur or merge with background at boundaries
  • 03Ears — frequently inconsistent, warped, or wrong shape
  • 04Teeth — weird geometry, extra teeth, unnatural gum color
  • 05Neck lighting — mismatch between face and neck illumination
  • 06Background bleed — warping or color bleeding at face boundaries
  • 07Eye highlights — reflections don't match actual light source direction
  • 08Temporal drift — face subtly changes between video frames

Deepfake Timeline

2017
Reddit user "deepfakes" releases faceswap tool — term becomes generic
2018–19
GAN-based face synthesis reaches photorealism; detection race begins
2020–21
Commercial tools democratize creation; first legislation in US states
2022–23
Diffusion models surpass GANs; real-time deepfake on consumer hardware
2024
C2PA provenance standards, AI watermarking, EU AI Act enforcement begins

Spot the Fake — Quiz

Click the image you think is the deepfake.

Score: 0 / 3
Option A
Option B
Round 1 of 3

Section 05

Motion Brush Simulator

Paint motion vectors onto regions of an image — then preview the directed animation.

Red=Right   Blue=Left   Green=Up   Yellow=Down
MOTION BRUSH CANVAS — CLICK & DRAG TO PAINT
WITHOUT Motion Brush

Entire scene moves as one with camera pan — no independent element control.

WITH Motion Brush

Painted regions move independently — person steps forward while trees sway separately.

Motion brush is how tools like Runway ML Gen-3 and Kling allow creators to direct AI video generation. Instead of hoping the AI guesses your intent, you explicitly paint where and how motion should occur — transforming the generation from stochastic to intentional.


Section 06

NeRF Explorer

Neural Radiance Fields reconstruct 3D scenes from 2D photographs — enabling novel view synthesis from any angle.

NERF SCENE
CAMERA ANGLE

8 Training Views

NeRF is trained on a fixed set of photographs. The camera ring shows the 8 capture positions. Novel views (between cameras) are synthesized — quality degrades at extreme novel angles.

NeRF → Gaussian Splatting (2023) +
3D Gaussian Splatting (2023, Kerbl et al.) replaced NeRF for most real-time applications by using explicit 3D Gaussian primitives instead of an implicit neural field. Each Gaussian has a 3D position, opacity, and color (view-dependent). Rendering is a simple splat operation — 100× faster than NeRF. Same reconstruction quality, real-time playback on consumer GPU. NeRF remains relevant for research; Gaussian Splatting dominates production.

NeRF captured research imagination in 2020 by synthesizing photorealistic novel views from as few as 20 photographs. It represented a paradigm shift: a neural network is the 3D scene — not a mesh or point cloud, but a function that maps 3D position + view direction to color and density.


Section 07

The Ethics Board

Every synthetic media decision is a moral choice. Four realistic scenarios — no single correct answer provided. Make your choice, see the consequences.

Card 01 · The Historical Speech
You have high-quality audio of a politician from 20 years ago saying things that now seem out of context. You could use AI video to create a plausible-looking speech of them saying those words in a modern setting to "restore context" for a documentary.
Card 02 · The Memorial
A family lost a loved one. They have home video footage and want an AI to recreate the person's voice and appearance to say goodbye at a private memorial service. The person left no instructions about posthumous AI use.
Card 03 · The Visual Effects
A studio wants to use an actor's likeness from 10 years ago to de-age them digitally in a new film. The actor has personally agreed and is paid. However, their performers' union has not yet approved AI likeness use in contracts.
Card 04 · The Satire
You want to create clearly-labeled satirical AI video of a public figure saying absurd things for a comedy show. The content would be labeled "PARODY — AI GENERATED" on screen throughout. The figure is a living elected official.

"Every synthetic media decision is a moral choice. The tools are neutral. The intention and the disclosure are everything."


Section 08

Artists & Studios

The people and organizations defining the frontier of AI video.

Runway ML
AI Video Pioneer Studio
Creators of Gen-1, Gen-2, and Gen-3 Alpha. First to offer commercial text-to-video and video-to-video tools at production quality. Acts as the bridge between AI research and working filmmakers.
"We're building the next generation of storytelling tools."
Pika Labs
Creative AI Video Startup
Rapid iteration on consumer-accessible AI video. Known for Pika 1.0's motion brush and lip sync features. Made high-quality text-to-video available to non-technical creators via Discord and web app.
Democratization as design principle.
Sora / OpenAI
Text-to-Video Research
Sora (2024) demonstrated world-model-level video generation — coherent physics, camera motion, and long-form temporal consistency. Trained on video as a spatiotemporal data format rather than frame sequences.
Video generation as physics simulation.
Google DeepMind
Lumiere · Veo Research
Lumiere introduced space-time diffusion for full-clip generation. Veo (2024) extended to high-resolution, multi-shot cinematic sequences with director-level prompt adherence and reference-image conditioning.
Research at cinema resolution.
Alexander Mordvintsev
DeepDream Creator · Google
Created DeepDream (2015), the first widely seen neural network visual art. Showed that CNNs contain learnable visual hallucinations — planting the seed for the generative AI art movement that followed.
The network dreams what it has learned to see.
Holly Herndon & Mat Dryhurst
AI Ethics in Art · Spawn AI
Created Spawn (2019) — an AI trained only on collaborative community data with explicit consent. Pioneered "data dignity" — the principle that artists must have rights over how their work trains AI systems.
"The dataset is the politics."
Refik Anadol
AI Data Sculpture · MoMA
Creates large-scale architectural AI installations — "Machine Hallucinations" trained on city datasets. Brought AI generative art to major institutions including MoMA. Frames AI as a medium for collective memory.
"Data is the new pigment."
Grimes
AI Voice Licensing · Aurora AI
Publicly licensed her voice model for fan use (elf.tech), splitting royalties 50/50. Pioneer of voluntary AI licensing as an artist business model. Aurora AI is her synthetic voice released as creative commons for non-commercial use.
Consent + compensation as the framework.

Glossary

22 essential terms in AI video and generative media.

Conditioning
Providing reference signals (images, poses, depth) to constrain AI generation toward a specific output.
ControlNet
Neural network architecture that adds structural constraints (edges, pose, depth) to diffusion model outputs.
Deepfake
AI-synthesized video of a real person saying or doing something they did not say or do.
Diffusion Model
Generative model that learns to reverse a noise-addition process, producing images by iterative denoising.
GAN
Generative Adversarial Network — two competing networks: a generator creating images and a discriminator judging them.
Gaussian Splatting
3D scene representation using explicit Gaussian primitives; enables real-time novel view synthesis.
Guidance Scale
Parameter controlling how strongly the model follows the text prompt vs. producing varied outputs.
Image-to-Video
Generating a video clip from a single still image, animating it with plausible motion.
Inpainting
Filling a masked region of an image or video with AI-generated content that matches the surrounding context.
Latent Space
Compressed abstract representation space where diffusion models generate before decoding to pixel space.
LoRA
Low-Rank Adaptation — efficient fine-tuning method allowing a model to learn a new style from a small dataset.
Motion Brush
Tool for painting directional motion vectors onto image regions to direct AI video generation.
NeRF
Neural Radiance Field — implicit neural representation of a 3D scene enabling novel view synthesis.
Noise Schedule
The function controlling how much noise is added at each diffusion timestep during training and inference.
Outpainting
Extending an image beyond its original borders using AI generation that matches the existing content.
Prompt Engineering
The practice of crafting text inputs to AI models to produce desired outputs reliably.
Seed
Random number initializing the noise used in generation — same seed + same prompt = same output.
Temporal Consistency
The property of a video where all frames agree on the scene's contents, lighting, and character appearance.
Text-to-Video
Generating video directly from a text description, without a source image or video.
Training Data
The dataset of images, videos, or text used to train a generative model.
VAE
Variational Autoencoder — encoder/decoder architecture used to compress images into latent space and back.
Video-to-Video
Using AI to restyle or transform an existing video clip, inheriting its temporal coherence.