If you have ever tried to run graph algorithms on a dataset with millions of relationships, you know the drill: spin up a Neo4j instance, manage Docker, worry about memory, or hack something together with NetworkX and watch it crash.
import kuzu db = kuzu.Database("./my_graph") conn = kuzu.Connection(db) Create schema (optional - you can also create on the fly) conn.execute("CREATE NODE TABLE Person(id INT64, name STRING, PRIMARY KEY(id))") conn.execute("CREATE REL TABLE Knows(FROM Person TO Person, since INT64)") Insert data conn.execute("CREATE (p:Person {id: 1, name: 'Alice'})") conn.execute("MATCH (a:Person), (b:Person) WHERE a.id = 1 AND b.id = 2 CREATE (a)-[:Knows {since: 2020}]->(b)") Query: 2-hop neighbors results = conn.execute(""" MATCH (p:Person)-[:Knows 1..2]->(friend:Person) WHERE p.name = 'Alice' RETURN friend.name, COUNT( ) """) kuzu v 0
Zero-Setup Graph Analytics: Diving into Kuzu v0 Subtitle: An embedded columnar graph database that actually feels like a library. If you have ever tried to run graph
arrives to fix that friction. It is not a wrapper. It is not a key-value store pretending to be a graph. It is a purpose-built, embedded, columnar graph database written in C++. It is not a wrapper
Kuzu stores properties . Want the average age of all Person nodes? That's a sequential scan of one integer column. Want to count how many Knows relationships each person has? That's a column scan of src and dst .
pip install kuzu Then head to the Kuzu documentation . The v0 release is stable enough to build on.
What graph workload would you run embedded? Let me know in the comments. Disclaimer: I am not affiliated with the Kuzu team — just an engineer who appreciates well-designed data infrastructure.