Perplexity released pplx-embed, a pair of open-source text embedding models built for production-scale search. The family includes pplx-embed-v1 for standard retrieval and pplx-embed-context-v1 for document-aware embeddings, both available at 0.6B and 4B parameter sizes. The company published a research blog alongside a technical paper detailing the approach.
The interesting architectural move: Perplexity took Qwen3 and converted it from a decoder-only LLM into a bidirectional encoder using diffusion-based pretraining. That lets the model process tokens in both directions simultaneously, which is better suited for embedding tasks than the standard left-to-right approach most LLMs use. Three stages of contrastive learning follow, with the final model produced by merging checkpoints via spherical linear interpolation.
On the ConTEB contextual retrieval benchmark, the 4B model scores 81.96% nDCG@10, ahead of Voyage's voyage-context-3 at 79.45% and Anthropic Contextual at 72.4%. Those are Perplexity's own reported numbers. On the broader MTEB multilingual retrieval benchmark, the 4B variant hits 69.66%, roughly matching Qwen3-Embedding-4B and beating Google's gemini-embedding-001. Independent verification is pending.
Storage is where things get practical. The models natively output INT8-quantized embeddings (trained that way, not post-hoc compressed), cutting storage 4x versus FP32. Binary quantization pushes that to 32x with under 1.6 percentage points of accuracy loss on the larger model. No instruction prefix required, either.
All four models are on Hugging Face under MIT license and accessible through the Perplexity API.
Bottom Line
Perplexity's 4B contextual embedding model scores 81.96% on ConTEB, topping Voyage and Anthropic's offerings by 2.5 and 9.5 points respectively, and ships under MIT license.
Quick Facts
- Models: pplx-embed-v1 and pplx-embed-context-v1 (0.6B and 4B sizes)
- ConTEB score: 81.96% nDCG@10 (4B, company-reported)
- MTEB multilingual retrieval: 69.66% nDCG@10 (4B, company-reported)
- Storage reduction: 4x (INT8) to 32x (binary) vs. FP32
- License: MIT, available on Hugging Face and Perplexity API




