Word Embeddings
Introduction
Word embeddings are dense vector representations of words that capture semantic meaning. Unlike one-hot encoding, embeddings place similar words close together in vector space, enabling mathematical operations on meaning.
How Word Embeddings Work
Key Concepts
- Distributional Hypothesis: Words with similar contexts have similar meanings
- Dense Representations: Typically 50-300 dimensions vs. vocabulary size
- Semantic Arithmetic: king - man + woman ≈ queen
- Transfer Learning: Pre-trained embeddings can be fine-tuned
Training Methods
- Word2Vec: Skip-gram and CBOW models
- GloVe: Global Vectors using co-occurrence statistics
- FastText: Subword information for morphology
- Contextual: ELMo, BERT (position-dependent)
Interactive Word Embedding Visualizer
Embedding Space
Click words to select them. Selected words show connections.
Training Architecture
Visualization of the current training method architecture.
Configuration
Training
Corpus Statistics
Controls
Using pretrained GloVe embeddings
Vocabulary Stats
Training Word2Vec
Skip-gram Model
Predicts context words given center word
- • Input: One-hot encoded center word
- • Hidden: Word embedding (no activation)
- • Output: Softmax over vocabulary
- • Objective: Maximize P(context|center)
P(w_c|w_t) = exp(v_c · v_t) / Σ exp(v_i · v_t)
CBOW Model
Predicts center word given context
- • Input: Sum/average of context word vectors
- • Hidden: Combined embedding representation
- • Output: Softmax over vocabulary
- • Objective: Maximize P(center|context)
P(w_t|context) = exp(v_t · h) / Σ exp(v_i · h)
Optimization Techniques
Instead of full softmax, sample k negative examples
Binary tree structure reduces computation
Down-sample frequent words like "the", "a"
Randomly vary context window size
Where Word Embeddings Are Used
Traditional NLP
- • Text classification
- • Named entity recognition
- • Sentiment analysis
- • Machine translation
Modern Applications
- • Input to transformer models
- • RAG system embeddings
- • Semantic search
- • Recommendation systems
Related Concepts
- • Sentence embeddings
- • Document embeddings
- • Cross-lingual embeddings
- • Multimodal embeddings
Word Algebra
Word embeddings enable fascinating arithmetic operations. The classic example "king - man + woman = queen" demonstrates how semantic relationships are encoded in vector space.
Build Your Equation:
Classic Examples
- king - man + woman→ queen
- paris - france + italy→ rome
- good - better + bad→ worse
Click to try these examples!
Why It Works
Word embeddings encode semantic relationships as vector offsets. The vector from "man" to "king" represents royalty/leadership, which when applied to "woman" yields "queen".
Add Custom Words
Expand the vocabulary by adding custom words with example contexts. The system will learn embeddings for your words based on their usage.
💡 Tips for adding words:
- Provide multiple example sentences for better embeddings
- Use the word in different contexts
- Include related words in the context
Key Takeaways
- Word embeddings capture semantic relationships in dense vectors
- Training uses the distributional hypothesis: similar contexts mean similar words
- Word2Vec offers two architectures: Skip-gram (better for rare words) and CBOW (faster)
- Embeddings enable semantic arithmetic and analogy tasks
- Pre-trained embeddings are foundational for modern NLP and transformers
- Quality depends on corpus size, dimension choice, and training parameters