Exploring Hierarchical Navigable Small World

A hands-on deep dive into the HNSW algorithm written by two Vespa summer interns who built visualization and analysis tools during their exploration. Covers how HNSW is structured, built, and searched, along with graph quality metrics and filtering behavior.

Key Concepts Covered

HNSW Structure and Search

Multi-layer proximity graph: top layers sparse (coarse navigation), bottom layer contains all nodes (fine search)
Greedy descent from entry point; bottom layer switches to top-k beam search
Local minima problem: search can get stuck, returning approximate (not exact) neighbors
Recall, response time, and memory footprint as the three standard ANN quality metrics

Graph Quality Metrics Beyond Recall

The authors developed tools to measure:

Disconnected components — nodes unreachable from the main cluster are invisible to ANN search; must be reinserted or reconnected
Edge density distribution — histogram of neighbor counts per node; reveals sparsity or overcrowding; helped find a Vespa bug (layer-0 neighbor cap used wrong constant: max-links-per-node instead of 2×max-links-per-node)
Neighbor proximity — how close a node’s graph neighbors are relative to the global distance distribution (e.g., what % are in the top 15%, 2%, 0.3%)

Distance Distribution Analysis

With ~600 visited nodes, HNSW finds ~40% of them within the closest 5% of all points in the dataset — confirming efficient navigation. ACORN-1 evaluates ~14k nodes for similar recall under aggressive filtering, trading cost for connectivity.

Filtering in HNSW

Four strategies with different trade-offs:

Fallback to exact search — used when filter selectivity drops below a threshold
Post-filter — fast but can return 0 hits when filtering is aggressive
Pre-filter (standard) — only filter-passing nodes considered; works until filtering is very selective
Pre-filter + check first (ACORN-1 style) — multi-hop neighborhoods avoid getting stuck; higher cost

Dimensionality Reduction

PCA as an option when some embedding dimensions carry near-zero variance (< 0.01%). Offers small speed-ups with minimal data loss. Effectiveness depends on embedding model, distance metric, and data. Alternative: Matryoshka Embeddings.

Bug Found

Edge density analysis revealed a production Vespa bug: the deletion code in layer 0 capped neighbors at max-links-per-node instead of 2×max-links-per-node, producing sparser graphs than intended.

Authors

Marianne Haugvaldstad — Developer Intern, Vespa
Brage Vik — Developer Intern, Vespa

Company

Vespa

HNSW · ACORN-1 · Vector Filtering · Dense Vector Retrieval · Matryoshka Embeddings

Awesome Search KG

Explorer

Exploring Hierarchical Navigable Small World

Exploring Hierarchical Navigable Small World

Key Concepts Covered

HNSW Structure and Search

Graph Quality Metrics Beyond Recall

Distance Distribution Analysis

Filtering in HNSW

Dimensionality Reduction

Bug Found

Authors

Company

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Exploring Hierarchical Navigable Small World

Exploring Hierarchical Navigable Small World

Key Concepts Covered

HNSW Structure and Search

Graph Quality Metrics Beyond Recall

Distance Distribution Analysis

Filtering in HNSW

Dimensionality Reduction

Bug Found

Authors

Company

Related Concepts

Graph View

Table of Contents

Backlinks