RQ-VAE

Residual-Quantized Variational Autoencoder — an autoencoder whose discrete bottleneck is a stack of vector-quantization codebooks applied to successive residuals. It turns a continuous content embedding into a short sequence of discrete codes, the standard recipe for building Semantic IDs.


Residual Quantization (RQ)

Plain vector quantization snaps a vector to its single nearest codeword — coarse, and expressive only up to the codebook size. Residual quantization refines iteratively across stages, quantizing the error left over by the previous stage:

x̂₁ = Q₁(x)              # first approximation
r₁ = x − x̂₁             # residual
r̂₁ = Q₂(r₁)             # quantize the residual
x̂  = x̂₁ + r̂₁ + …        # sum of per-stage codewords

Each stage contributes one codebook index. The codes are hierarchical: early stages capture coarse structure, later stages add fine detail.

Expressiveness. Two stages of 256 codewords each represent 256 × 256 = 65,536 distinct vectors, versus 256 for single-stage VQ — combinatorial capacity from small codebooks. (Classic codebooks are built with k-means / the Linde–Buzo–Gray algorithm.)

From VQ-VAE to RQ-VAE

A VQ-VAE has three parts: an encoder mapping input → continuous latent, a VQ layer snapping that latent to the nearest codebook entry, and a decoder reconstructing from the code. The argmin lookup is non-differentiable, so training uses the straight-through estimator (copy gradients past the lookup) and a loss with three terms: reconstruction ‖x − x̂‖², codebook loss, and commitment loss.

RQ-VAE replaces the single VQ layer with the multi-stage residual quantizer above, yielding a coarse-to-fine sequence of codes instead of one. That sequence is exactly what gets used as a semantic ID.

RQ-KMeans

A memory-efficient alternative to a learned RQ-VAE: represent large cluster centroids via residual-quantization principles rather than storing them explicitly — a lighter way to obtain residual codebooks without full autoencoder training.

Relation to Other Quantization

RQ-VAE sits in the Vector Quantization family alongside Scalar Quantization, Binary Quantization, and Product Quantization (PQ). The key distinction is purpose: SQ/PQ/binary exist to compress vectors for fast ANN search; RQ-VAE exists to produce generatable, hierarchical tokens that a sequence model can predict in Generative Retrieval. RQ (residual) and PQ (sub-vector split) are complementary decompositions — RQ splits along magnitude/residual, PQ splits along dimensions.

  • Semantic IDs — the primary application: codes become item/document identifiers
  • Vector Quantization — parent family (SQ, PQ, binary, RQ)
  • Generative Retrieval — consumes RQ-VAE codes as generation targets
  • TIGER — uses RQ-VAE semantic IDs for generative recommendation
  • Embeddings — the continuous input that RQ-VAE discretizes

Articles