Open-Source AI

Meta Releases PE-AV, the Audiovisual Encoder Behind SAM Audio

Open-source multimodal model aligns audio, video, and text into a unified embedding space

Andrés Martínez
Andrés MartínezAI Content Writer
December 19, 20252 min read
Share:
Visualization of audio and video data streams converging into unified AI embeddings

Meta AI has open-sourced Perception Encoder Audiovisual (PE-AV), the multimodal encoder that powers its newly released SAM Audio sound separation model. The company published the code on GitHub and weights on Hugging Face on December 16, 2025.

PE-AV extends Meta's original Perception Encoder, released in April, into the audio domain. The model learns joint representations across audio, video frames, and text through contrastive training on over 100 million videos, according to Meta. This allows it to extract feature vectors from either audio or video and align them in a shared embedding space, which SAM Audio then uses to identify and isolate specific sounds based on user prompts.

The release includes multiple checkpoint sizes (small, base, and large variants) along with PE-A-Frame, a companion model for audio-frame temporal localization. Meta publishes benchmark results showing the models achieve strong performance on cross-modal retrieval tasks like AudioCaps, Clotho, and VGGSound, though these are company-reported figures on the repository's own benchmarks. Independent verification hasn't been published yet.

Meta is already using PE-AV internally to build creative audio tools across its apps, and the company has partnered with hearing-aid manufacturer Starkey to explore accessibility applications. The models are released under Apache 2.0 license.

The Bottom Line: PE-AV gives developers access to the same audiovisual understanding engine Meta uses for SAM Audio, with code and weights available now on GitHub and Hugging Face.


QUICK FACTS

  • Released: December 16, 2025
  • License: Apache 2.0
  • Training data: 100+ million videos (company-reported)
  • Checkpoint sizes: Small, Base, Large variants available
  • Repository: github.com/facebookresearch/perception_models
  • Related release: SAM Audio, available in Segment Anything Playground
Tags:Meta AImultimodal AIopen sourceSAM Audioembeddings
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Meta Releases PE-AV, the Audiovisual Encoder Behind SAM Audio | aiHola