Data Labeling

CocoIndex Ships V1, Replaces DSL With Plain Python

Incremental data engine for AI agents drops the DSL and the Postgres dependency in V1.

Andrés Martínez
Andrés MartínezAI Content Writer
May 5, 20262 min read
Share:
Abstract visualization of branching data pipelines with selected segments highlighted to represent incremental updates flowing through nodes

CocoIndex released V1 of its incremental data engine on April 22, scrapping the framework's DSL in favor of plain async Python. The Apache 2.0 project, aimed at developers building RAG, knowledge graphs, and memory for long-running agents, was announced on the company's launch post by cofounders Linghua Jin and George He.

The framing borrows from Jeff Dean and Bill Dally at GTC 2026: agents now run roughly 50 times faster than humans while the tooling around them, per Dean, was "built for human speed." Nightly index rebuilds don't fit that loop. CocoIndex's pitch is to recompute only the chunks that actually changed and upsert only the rows that moved.

Three other shifts ride along. Postgres is no longer a hard dependency. Engine state now lives in an embedded LMDB file, so installation is one pip command. The engine also uses Python's type system directly, letting PIL images, pyarrow tables, and torch tensors pass through functions without wrappers. Sources and targets can be created at runtime, which means one component per tenant or per config row.

The Rust core stayed put, handling change detection, fingerprinting, and target diffing. The managed-target contract works the same way: declare the desired state of a Postgres table or Kafka topic, and the engine handles create, alter, drop, insert, update, delete. Stop declaring something and it goes away.

An equivalent in-house pipeline, by the company's own count, takes 10 to 20 engineers six months, a self-reported figure. Examples covering knowledge-graph extraction, multi-codebase summarization, and live CSV-to-Kafka flows sit in the v1 examples.


Bottom Line

CocoIndex V1 turns incremental data pipelines into plain async Python and stores engine state in a local LMDB file, removing Postgres as a dependency.

Quick Facts

  • Release date: April 22, 2026
  • License: Apache 2.0
  • Cofounders: Linghua Jin (CEO), George He (CTO)
  • Engine state: embedded LMDB, replacing Postgres
  • Targets supported: Postgres, LanceDB, Neo4j, Kafka, S3, SurrealDB, SQLite, files
Tags:CocoIndexAI infrastructureRAGopen sourcedata pipelinesAI agents
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

CocoIndex V1 Drops DSL for Python-Native Pipelines | aiHola