NalaDB unifies key-value, graph and time-series in one engine. Every write is versioned, entities form a graph projected onto a temporal KV store, and Hybrid Logical Clocks preserve causal order — so a single query can time-travel, follow dependency chains and tell you what caused what, with a confidence score.
The database that remembers everything -- and knows what caused what.
NalaDB is a temporal key-value graph database with causal ordering. It stores every version of every value, models relationships between entities as a graph, and uses Hybrid Logical Clocks to track what caused what. One query can time-travel to last Tuesday, follow dependency chains, and tell you why your pump failed -- with a confidence score.
-- "The API gateway is down. What broke it?"
CAUSAL FROM api_gateway
AT "2025-03-08T14:32:05Z"
DEPTH 5 WINDOW 5m
WHERE confidence > 0.5
RETURN path, delta, confidence
ORDER BY confidence DESC
| confidence | delta | path |
+------------+-------+----------------------------------------------+
| 0.92 | 4.2s | redis_main -> checkout_svc -> api_gateway |
| 0.78 | 8.1s | redis_main -> session_svc -> api_gateway |
Redis went down first. The checkout service felt it 4 seconds later. The API gateway followed. One query. No log-grepping. No Slack threads.
Databases make you choose:
| You need... | Tool | But you lose... |
|---|---|---|
| Time-series data | InfluxDB, TimescaleDB | Relationships between things |
| Relationships & traversals | Neo4j, DGraph | Version history and time-travel |
| Fast key-value lookups | Redis, etcd | Both of the above |
Real systems don't fit in one box. A temperature sensor has readings (time-series), belongs to a machine (graph), and its spike 20 minutes ago caused the pump to overheat (causality). To answer "why did the pump fail?", you currently need three databases, a data pipeline, and a prayer.
NalaDB puts all three in one engine. Every write is versioned. Every entity can have relationships. Every change has a timestamp that preserves causal ordering. When something goes wrong, you ask the database directly.
This isn't three databases duct-taped together. The graph layer is projected onto the temporal KV store -- nodes and edges are just keys with a naming convention:
node:pump_01:meta -> {"type": "pump", "id": "pump_01"}
node:pump_01:prop:temperature -> 85.3 (version 1, t=100)
-> 87.1 (version 2, t=200)
-> 91.7 (version 3, t=300)
edge:e001:meta -> {"from": "sensor_01", "to": "pump_01", "relation": "monitors"}
graph:adj:pump_01:in -> ["e001", "e002"]
This means graph data automatically inherits every temporal capability -- versioning, time-travel, causal ordering, downsampling -- for free. There's no separate graph storage engine. No sync jobs. No eventual consistency between layers.
| Capability | KV alone | Graph alone | Temporal alone | NalaDB (all three) |
|---|---|---|---|---|
| "What is the current temperature?" | Fast O(1) lookup | -- | -- | Fast O(1) lookup |
| "What was it yesterday at 3pm?" | -- | -- | Point-in-time query | Point-in-time query |
| "What does this sensor monitor?" | -- | Traversal | -- | Traversal with temporal filter |
| "What did the topology look like last week?" | -- | -- | -- | Time-travel graph query |
| "Why did the pump fail?" | -- | -- | -- | Causal traversal with confidence |
| "What changed between deployments?" | -- | -- | -- | DIFF query on graph state |
| "Show me 24h of readings, downsampled for a dashboard" | -- | -- | History + LTTB | History + LTTB |
| "Which sensors have anomalous write patterns?" | -- | -- | -- | META query with statistics |
The "--" cells aren't just missing features. They're impossible without combining all three paradigms. A pure graph database can tell you the pump is connected to the sensor -- but not what the sensor reading was at the time of failure. A pure time-series database can show you the temperature spike -- but not which downstream equipment was affected. Only the combination answers the full question.
You can. Many teams do. Here's what it costs:
NalaDB eliminates all of this with a single binary, a single query language, and a single source of truth for both structure and history.
NalaDB isn't a drop-in replacement for every database. Here's what you're signing up for:
Pros:
Cons:
Start a 3-node cluster with Prometheus + Grafana:
make cluster
Check that everything is up:
docker compose -f docker/docker-compose.yml ps
Service endpoints:
| Service | Endpoint |
|---|---|
| NalaDB node 0 (leader) | localhost:7301 (gRPC) |
| NalaDB node 1 | localhost:7302 (gRPC) |
| NalaDB node 2 | localhost:7303 (gRPC) |
| Grafana dashboard | http://localhost:3000 (admin/naladb) |
| Prometheus | http://localhost:9093 |
# Store a sensor reading
grpcurl -plaintext -d '{"key":"sensor:temp_1","value":"MjUuMA=="}' \
localhost:7301 naladb.v1.KVService/Set
# Read it back
grpcurl -plaintext -d '{"key":"sensor:temp_1"}' \
localhost:7301 naladb.v1.KVService/Get
# Create a graph node
grpcurl -plaintext -d '{"type":"sensor","properties":{"location":"cGxhbnQtQQ=="}}' \
localhost:7301 naladb.v1.GraphService/CreateNode
# Build the CLI
go build -o bin/naladb-cli ./cmd/naladb-cli
# Connect to NalaDB
./bin/naladb-cli -addr localhost:7301
Interactive session:
NalaDB CLI (connected to localhost:7301)
Type a NalaQL query and press Enter. Use ';' to end multi-line queries.
Commands: \q quit \h help
naladb> MATCH (s:sensor)-[r:monitors]->(m:machine)
...> WHERE m.type = "hydraulic_press"
...> RETURN s.id, m.id, r.weight
...> LIMIT 5;
| m.id | r.weight | s.id |
+-----------------+----------+-----------------+
| hydraulic_001 | 0.95 | temp_sensor_01 |
| hydraulic_001 | 0.87 | vibr_sensor_03 |
| hydraulic_002 | 0.91 | temp_sensor_07 |
(3 rows, 1.204ms)
naladb> CAUSAL FROM temp_sensor_01
...> AT "2025-03-08T14:32:05Z"
...> DEPTH 3 WINDOW 5m
...> WHERE confidence > 0.5
...> RETURN path, delta, confidence
...> ORDER BY confidence DESC;
| confidence | delta | path |
+------------+-------+-------------------------------------------+
| 0.92 | 4.2s | temp_sensor_01 -> hydraulic_001 -> alert_7 |
| 0.78 | 8.1s | temp_sensor_01 -> hydraulic_001 |
(2 rows, 3.817ms)
naladb> GET history("node:temp_sensor_01:prop:temperature")
...> FROM "2025-03-08" TO "2025-03-09"
...> DOWNSAMPLE LTTB(5);
| hlc | key | value |
+--------------------+--------------------------------------+-------+
| 1741392000000000 | node:temp_sensor_01:prop:temperature | 21.3 |
| 1741413600000000 | node:temp_sensor_01:prop:temperature | 24.7 |
| 1741435200000000 | node:temp_sensor_01:prop:temperature | 28.1 |
| 1741456800000000 | node:temp_sensor_01:prop:temperature | 25.4 |
| 1741478400000000 | node:temp_sensor_01:prop:temperature | 22.0 |
(5 rows, 2.561ms)
naladb> DIFF (m:material)-[s:supplied_by]->(sup:supplier)
...> BETWEEN "2025-01-01" AND "2025-03-01"
...> RETURN edge_id, from, to, change;
| change | edge_id | from | to |
+---------+---------+--------------+--------------+
| added | e_1042 | steel_batch3 | supplier_new |
| removed | e_0871 | steel_batch1 | supplier_old |
(2 rows, 5.102ms)
naladb> META "node:sensor_*:prop:temperature"
...> WHERE write_rate > 10.0
...> RETURN key, write_rate, avg_interval, stddev_value;
| avg_interval | key | stddev_value | write_rate |
+--------------+--------------------------------------+--------------+------------+
| 1.2s | node:sensor_01:prop:temperature | 3.42 | 50.5 |
| 0.8s | node:sensor_07:prop:temperature | 5.17 | 75.2 |
(2 rows, 0.893ms)
naladb> SHOW NODES WHERE type = "sensor" LIMIT 3;
| id | properties | type |
+-----------------+------------------+--------+
| temp_sensor_01 | prop_value, unit | sensor |
| vibr_sensor_03 | prop_value | sensor |
| temp_sensor_07 | prop_value, unit | sensor |
(3 rows, 0.287ms)
naladb> \q
Bye!
Run a one-off query from the shell:
./bin/naladb-cli -e 'MATCH (n:sensor) RETURN n.id LIMIT 10'
NalaQL is a Cypher-inspired query language with temporal extensions. Here's one query per capability:
-- Graph pattern matching with temporal filter
MATCH (s:sensor)-[r:monitors]->(m:machine)
AT "2025-03-08T14:32:05Z"
WHERE r.weight > 0.5
RETURN s.id, m.id
-- Causal root-cause analysis
CAUSAL FROM pump_01
AT "2025-03-08T14:32:05Z"
DEPTH 5 WINDOW 30m
WHERE confidence > 0.5
RETURN path, delta, confidence
-- Graph with live KV data (bridging graph and time-series)
MATCH (s:Sensor)-[:BELONGS_TO]->(m:Machine)
WHERE m.criticality = "critical"
FETCH latest(s.readingKey) AS s.value
RETURN s.sensorId, s.metric, s.value
-- Temporal diff between two points in time
DIFF (m:material)-[s:supplied_by]->(sup:material)
BETWEEN "2025-01-01" AND "2025-03-01"
RETURN edge_id, from, to, change
-- Time-series history with visual downsampling
GET history("node:temp_sensor_305a:prop:temperature")
FROM "2025-03-08" TO "2025-03-09"
DOWNSAMPLE LTTB(288)
-- Key statistics (write rate, stddev, cardinality)
META "node:sensor_*:prop:temperature"
WHERE write_rate > 10.0
RETURN key, write_rate, stddev_value
See docs/nalaql.md for the full language specification.
┌──────────────────────────────────────────────────────────┐
│ gRPC API │
│ KVService │ GraphService │ WatchService │ Health │
├──────────────────────────────────────────────────────────┤
│ NalaQL Engine │
│ Lexer -> Parser -> Executor │
├────────────────────┬─────────────────────────────────────┤
│ Graph Layer │ Temporal KV Store │
│ (Node/Edge CRUD, │ (Set, Get, GetAt, Delete, │
│ Traverse, Causal) │ History, BatchSet) │
├────────────────────┴─────────────────────────────────────┤
│ Tiered Storage (Hot RAM + Cold Disk) │
│ │
│ HOT: vlog (last N versions) | Index | Sorted Keys │
│ COLD: Segments (Bloom+Sparse) | WAL | Blob Store │
│ Background Evictor (configurable retention) │
├──────────────────────────────────────────────────────────┤
│ RAFT Consensus | TTL Manager | Multi-Tenancy | Metrics │
├──────────────────────────────────────────────────────────┤
│ Hybrid Logical Clock (HLC) │
│ 48-bit Wall-Time | 4-bit NodeID | 12-bit Logical │
└──────────────────────────────────────────────────────────┘
For details, see docs/architecture.md.
Each use case includes a documentation page and a runnable example:
| Use Case | Documentation | Example |
|---|---|---|
| Predictive Maintenance | docs/usecases/predictive-maintenance.md | go run examples/predictive-maintenance/main.go |
| Fraud Detection | docs/usecases/fraud-detection.md | go run examples/fraud-detection/main.go |
| Supply Chain Transparency | docs/usecases/supply-chain.md | go run examples/supply-chain/main.go |
| Smart Building / Digital Twin | docs/usecases/smart-building.md | go run examples/smart-building/main.go |
| IT Infrastructure | docs/usecases/it-infrastructure.md | go run examples/it-infrastructure/main.go |
| Operation | Target | Notes |
|---|---|---|
| Current-Value Get | < 1 us | In-memory O(1) via sharded index |
| Point-in-Time (hot) | < 1 us | In-memory binary search |
| Point-in-Time (cold) | < 100 us | Bloom + Sparse Index + disk seek |
| Write Throughput | > 500k/s per node | Batch fsync, lock-free sharded index |
| History (last=100) | < 10 ms | In-memory or merged RAM + segment streaming |
| 3-Hop Traversal | < 50 ms | In-memory adjacency lists |
| Causal (depth=5) | < 100 ms | HLC ordering + BFS |
make bench
Requires Go 1.24+ and protoc (for proto generation).
git clone https://github.com/thatscalaguy/naladb.git
cd naladb
make build
./bin/naladb
# Server
docker run -p 7301:7301 -p 9090:9090 thatscalaguy/naladb:latest
# CLI (connect to a running NalaDB instance)
docker run --rm -it --network host thatscalaguy/naladb-cli:latest -addr localhost:7301
Download pre-built binaries from the releases page.
./naladb -addr :7301 -wal-dir data/wal -node-id 0
See docs/configuration.md for the full YAML reference covering:
max_clock_skew)NalaDB supports configurable memory retention via MaxMemoryVersions. This controls how many versions per key are kept in RAM -- older versions are served from on-disk segments.
storage:
max_memory_versions: 100 # Keep last 100 versions per key in RAM
eviction_interval: "30s" # Background eviction cycle
eviction_batch_size: 10000 # Keys processed per eviction cycle
| Setting | Behavior |
|---|---|
max_memory_versions: 0 |
Pure in-memory (default, all versions in RAM) |
max_memory_versions: 1 |
Minimal RAM: only latest version in vlog, rest on disk |
max_memory_versions: 100 |
Balanced: recent history in RAM, older on disk |
max_memory_versions: 1000000 |
Effectively in-memory for most workloads |
See docs/storage-model.md for capacity planning and the tiered storage architecture.
| Document | Description |
|---|---|
| Integration Guide | How to write data and run queries from any language (Go examples, gRPC code generation) |
| Architecture | How the layers fit together (HLC, WAL, Index, Segments, Graph, Causal, RAFT) |
| NalaQL | Full query language spec with EBNF grammar |
| API Reference | gRPC API with grpcurl and Go examples |
| Causal Analysis | Deep dive on the causal traversal algorithm |
| Clustering | RAFT setup, consistency levels, clock skew protection, Docker deployment |
| Configuration | Full YAML reference |
| Storage Model | Tiered storage, eviction, capacity planning |
| Concept: Why KV + Graph + Temporal | The design philosophy and comparison with alternatives |
NalaDB started over 10 years ago as a freelance project -- a predictive maintenance system built for a manufacturing customer. The first version used Ruby, Redis, and PostgreSQL. As the requirements grew, it was rewritten in Scala with Actors for better concurrency. The final rewrite in Go distilled the core idea into what NalaDB is today: a temporal-graph-kv store purpose-built for tracking causality over time.
The original customer recently allowed the project to be open-sourced. All customer-specific code has been removed and the git history was reset for a clean start. NalaDB is actively used in production, though development happens at a project-maintainer pace rather than a full-time team.
The original domain was industrial predictive maintenance (sensors, machines, failure chains), and that remains a strong use case. However, the current focus is shifting toward root cause analysis for software cloud infrastructure -- tracing cascading failures across services, dependencies, and infrastructure components.
make build # Build
make test # Run tests with race detection
make lint # Run linter
make bench # Run benchmarks
make proto # Generate protobuf stubs
make cluster # Start 3-node cluster
make test && make lint before submittingBuilt and maintained by Sven — need it in production, extended or reviewed?
GitHub ↗Hire the author →