KlingAI rolls out version 2.6 of its video generator with a headline feature: audio-video co-generation. The model now produces video and its accompanying soundtrack simultaneously in a single generation pass, rather than treating them as separate tasks.
The upgrade handles multi-character dialogue, music video creation, and intricate audio scenes like ASMR or action sequences. KlingAI highlights the system's improved lip-sync accuracy, syncing speech to mouth movements with greater precision than before.
Some constraints apply. Voice generation currently works only in English and Chinese. Prompts in other languages get auto-translated to English before processing. For Image-to-Video workflows, output quality now depends more heavily on the resolution of the source image, so low-res inputs mean lower-quality results.
The Bottom Line: KlingAI bets that unified audio-video generation will save creators time and deliver tighter sync than stitching separate outputs together.
QUICK FACTS
- Version: KlingAI 2.6
- Key feature: Audio-video co-generation in one pass
- Supported voice languages: English and Chinese only
- Capabilities: Multi-character dialogue, music clips, ASMR, action scenes
- Image-to-Video: Output quality now tied more closely to source image resolution




