Word2Box

An unsupervised method (Dasgupta et al., ACL 2022, Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings) that learns a Box Embedding for every word directly from a text corpus — no hierarchy supervision required. It is, in effect, Word2Vec with boxes: the CBOW training scheme is reused, but a word is a box and similarity is the volume of box intersection (computed with the Gumbel-box machinery) instead of a vector dot product.

How It Trains

  1. Pick a center word from the corpus.
  2. Collect context words within a window size around it (CBOW-style sampling, following Mikolov et al., 2013).
  3. Convert center and context words to boxes; increase the overlap between center-box and context-boxes.
  4. To avoid the degenerate “make every box huge” solution, draw negative-sampled words and decrease their overlap with the context boxes.

Overlap is the intersection volume of Gumbel boxes, which keeps gradients well-behaved.

Evaluation

  • Setup: 64-dim Word2Box vs. 128-dim Word2Vec — a fair comparison on parameter count, since a box stores two corners (start + end) per dimension. ~900M words of preprocessed English, 10 epochs.
  • Word similarity: measured by Spearman rank correlation against human-annotated pairs on benchmarks such as SimLex-999. Word2Box generally surpasses Word2Vec, with the largest gains where rare words are involved.
  • Set-theoretic / collective operations: quantitative and qualitative tests show boxes handle polysemy and “strict meaning” better than point methods like Word2Vec.

Why Boxes Beat Points Here

A point vector collapses all senses of a polysemous word into one location. A box can stretch to cover multiple senses and supports set operations — intersecting two word boxes approximates the conjunction of their meanings.

Articles

People