Developer Documentation

JetGraph Documentation

Everything you need to connect, query, and build with JetGraph — the high-performance in-memory graph engine built in Rust.

⚡ Quick Start in 5 Minutes

Get JetGraph running locally and execute your first graph query — no build step, no configuration.

1

Start JetGraph with Docker Compose

Save the following as docker-compose.yml and run docker compose up -d.

docker-compose.yml

services:
  graphengine:
    image: alhascan/jetgraph-demo:latest
    restart: unless-stopped
    ports:
      - "8080:8080"
      - "7687:7687"
    volumes:
      - graphengine-data:/data
    environment:
      RUST_LOG: info
      ENABLE_ADMIN_RESET: "true"
    healthcheck:
      test: ["CMD", "sh", "-c", "curl -sf http://localhost:8080/health || curl -sf http://localhost:8080/api/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 120s

  segment-evaluator:
    image: alhascan/jetgraph-demo:latest
    entrypoint: ["/usr/local/bin/segment-evaluator"]
    restart: unless-stopped
    environment:
      RUST_LOG: info
      GRAPH_ENGINE_ENDPOINT: http://graphengine:50051
      SEGMENT_EVALUATOR_CONFIG_DB: /data/seg-config
    volumes:
      - seg-config-data:/data/seg-config
    depends_on:
      graphengine:
        condition: service_healthy

  pattern-miner:
    image: alhascan/jetgraph-demo:latest
    entrypoint: ["/usr/local/bin/pattern-miner"]
    restart: unless-stopped
    environment:
      RUST_LOG: info
      GRAPH_ENGINE_ENDPOINT: http://graphengine:50051
      PATTERN_MINER_CONFIG_DB: /data/pm-config
      PATTERN_MINER_ADDR: 0.0.0.0:8082
    volumes:
      - pm-config-data:/data/pm-config
    depends_on:
      graphengine:
        condition: service_healthy

  graphengine-ui:
    image: alhascan/jetgraph-ui-demo:latest
    restart: unless-stopped
    ports:
      - "80:3000"
    environment:
      BOLT_URL: bolt://graphengine:7687
      NEO4J_USER: ""
      NEO4J_PASSWORD: ""
      GRAPH_HTTP_URL: http://graphengine:8080
      SEGMENT_API_URL: http://segment-evaluator:8081
      PATTERN_MINER_URL: http://pattern-miner:8082
    depends_on:
      graphengine:
        condition: service_healthy

volumes:
  graphengine-data:
  seg-config-data:
  pm-config-data:

2

Verify the engine is ready

Wait about 5 seconds for the container to start, then check the health endpoint.

bash

curl http://localhost:8080/health
# → {"status":"ok","ready":true}

3

Load sample data with one click (optional)

Open the Admin UI at http://localhost, click Schema in the left nav, then click ⚡ Apply Schema & Load Sample Data in the Quick Start — Credit Card Fraud Space card. This provisions the Credit Card Fraud schema and seeds a representative dataset so every query in Analytics returns results. When the green All done banner appears, explore the data in Graph Explorer, Cypher Editor, or the Analytics pages. Prefer to bring your own schema? Skip this step and register it manually in Step 4 below.

4

Or — register a schema and write your first node

Skip this step if you loaded the sample dataset in Step 3. Otherwise, schema must be declared once before any data can be written. Run these three calls in order.

bash — POST /cypher

# Step 1 — register a node type
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerNodeType(\"USER\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}'

# Step 2 — finalize the schema (required before any writes)
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.finalizeSchema() YIELD schema_version RETURN schema_version","parameters":{}}'

# Step 3 — create a node
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CREATE (u:USER {external_id: $id}) RETURN u.external_id AS created","parameters":{"id":"user-001"}}'
# → {"columns":["created"],"rows":[["user-001"]]}

✅

The Admin UI at http://localhost gives you a Cypher editor, schema designer, and graph explorer — great for exploration without writing code.

Introduction

What is JetGraph?

JetGraph is a purpose-built, in-memory graph engine designed for applications that need real-time graph queries and decisions at high throughput. It stores your graph entirely in memory for sub-millisecond access, supports the Cypher query language, speaks the Bolt wire protocol (compatible with all official Neo4j drivers), and exposes a simple HTTP/Cypher API for any language.

JetGraph is not a general-purpose persistent database — it is purpose-built for high-velocity workloads where you need graph signals in real time: fraud detection, recommendation engines, anomaly detection, network security, and more.

Key Features

⚡

Sub-millisecond Queries

Entirely in-memory; O(1) velocity lookups via pre-computed rings.

🔤

Cypher Query Language

The industry-standard graph query language — expressive and readable.

🔌

Bolt Protocol

Drop-in replacement for Neo4j drivers — no driver changes required.

🦀

Rust Performance

Built in Rust for predictable, low-latency performance under load.

📡

Streaming Ingestion

Ingest up to 35,000 events/sec from Kafka, webhooks, or direct API.

🔗

Risk Propagation

Automatic fraud contagion across the graph with DashMap-based O(1) reads.

When to Use a Graph Database

Graph databases shine when the relationships between entities are as important as the entities themselves. Use JetGraph when you need to:

Detect patterns across connected entities (e.g., shared devices, IPs, or accounts)
Compute real-time velocity and novelty signals (e.g., "how many times did this card transact in the last hour?")
Traverse multi-hop paths (e.g., "find all merchants connected to a flagged card via two hops")
Build recommendation systems based on shared relationships
Propagate risk or scores across a network automatically

Scenario	Relational DB	JetGraph
Simple row lookups by primary key	Ideal	Overhead
Multi-hop relationship traversal	Expensive JOINs	Native
Real-time velocity counting	Aggregate queries	O(1) pre-computed
Pattern detection across a network	Very complex	Cypher traversal
Durable, large-scale persistence	Ideal	Use upstream store

Running JetGraph

Prerequisites

Docker 24+ and Docker Compose v2 — required for the demo image
curl — for testing API calls from the terminal
Any Neo4j-compatible Bolt driver (optional, for Bolt connections)
Rust toolchain (optional, only for the jetgraph-client crate)

Exposed Ports

Port	Protocol	Purpose
`8080`	HTTP	Cypher REST API (`POST /cypher`), health, metrics
`7687`	TCP / Bolt	Bolt binary protocol — Neo4j driver compatible
`50051`	TCP / gRPC	High-throughput Rust client (`jetgraph-client` crate), streaming ingestion
`80`	HTTP	Admin UI (graphengine-ui container)

Health Check

bash

curl http://localhost:8080/health
# {"status":"ok","ready":true}

ℹ️

The engine starts with an empty in-memory graph. Schema must be registered and finalized before any data can be written. The schema persists to the mounted volume so it survives container restarts.

Connection — REST / Cypher API

The simplest way to interact with JetGraph from any language. Send a JSON body with a query (Cypher string) and optional parameters to POST /cypher.

Base URL

http://localhost:8080

Request Format

Field	Type	Description
`query`	string	A Cypher query string
`parameters`	object	Named parameters referenced as `$name` in the query

Response Format

Successful responses return a JSON object with columns and rows:

{"columns": ["id", "label"], "rows": [["user-001", "USER"]]}

Example Requests

bash — create a node

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "CREATE (u:USER {external_id: $id, email: $email}) RETURN u.external_id AS id",
    "parameters": {"id": "user-001", "email": "alice@example.com"}
  }'
# → {"columns":["id"],"rows":[["user-001"]]}

bash — match nodes

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "MATCH (u:USER) RETURN u.external_id AS id LIMIT 10",
    "parameters": {}
  }'

bash — create a relationship

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "MATCH (u:USER {external_id: $uid}), (m:MERCHANT {external_id: $mid}) CREATE (u)-[:TRANSACTS_AT]->(m) RETURN true AS created",
    "parameters": {"uid": "user-001", "mid": "merchant-42"}
  }'

Additional Endpoints

Method	Path	Description
`GET`	`/health`	Returns JSON with `ready=true`; 503 while loading snapshot
`GET`	`/metrics`	Prometheus text metrics — ingest rate, query counters, memory pressure, RCU retries

Error Responses

On error, JetGraph returns a non-2xx HTTP status with a JSON error body:

{"error": "Schema not finalized. Call db.finalizeSchema() before writing data."}

HTTP Status	Meaning
`200 OK`	Query executed successfully
`400 Bad Request`	Malformed query or invalid parameters
`409 Conflict`	Schema conflict or duplicate node type
`500 Internal Server Error`	Unexpected engine error — check logs
`503 Service Unavailable`	Engine still loading snapshot at startup — retry after a moment

Connection — Bolt Protocol

JetGraph speaks the Bolt binary protocol on port 7687, the same protocol used by Neo4j. This means any official or community Neo4j driver works out-of-the-box — no code changes, no new SDK to learn.

What is Bolt?

Bolt is a binary, connection-oriented protocol optimized for graph databases. It supports efficient serialisation of Cypher queries and results, pipelining, and authentication. Because JetGraph is Bolt-compatible, you can use drivers for Python, JavaScript, Java, Go, .NET, and more without modification.

Connection Details

Parameter	Value (demo mode)
URL	`bolt://localhost:7687`
Username	`""` (empty)
Password	`""` (empty)

Python Example

python

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("", ""))

with driver.session() as session:
    # Create a node
    session.run(
        "CREATE (u:USER {external_id: $id})",
        id="user-bolt-001"
    )

    # Query nodes
    result = session.run("MATCH (u:USER) RETURN u.external_id AS id LIMIT 5")
    for record in result:
        print(record["id"])

driver.close()

JavaScript (Node.js) Example

javascript

const neo4j = require('neo4j-driver');

const driver = neo4j.driver(
  'bolt://localhost:7687',
  neo4j.auth.basic('', '')
);
const session = driver.session();

const result = await session.run(
  'MATCH (u:USER) RETURN u.external_id AS id LIMIT 10'
);
result.records.forEach(r => console.log(r.get('id')));

await session.close();
await driver.close();

Connection — Rust Client

The jetgraph-client crate provides a typed, ergonomic API over gRPC. It is the recommended client for Rust applications that need the highest throughput and the lowest latency.

🦀

The Rust client is open source — source, issues, and release notes live at github.com/JetGraphEngine/JetGraphClient. The full JetGraphEngine organization hosts all official repositories.

Installation

Add the crate to your Cargo.toml:

toml — Cargo.toml

[dependencies]
jetgraph-client = "*"
tokio            = { version = "1", features = ["full"] }

Minimal Working Example

rust

use jetgraph_client::{GraphClient, CreateEdgeRequest, VelocityQuery, prop};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Connect to the engine
    let graph = GraphClient::connect("http://localhost:50051").await?;

    // Lookup an existing node by external ID
    let card_id = graph.lookup_node("CARD", "card-001").await?;

    // Query velocity: how many TRANSACTS_AT edges in the last hour?
    let count = graph
        .get_velocity_count(VelocityQuery {
            node:        card_id,
            edge_type:   "TRANSACTS_AT".into(),
            window_secs: 3600,
        })
        .await?
        .count;

    println!("Transactions in last hour: {}", count);
    Ok(())
}

Recommended Usage Pattern

For scoring workloads, follow the Query → Score → Insert three-phase pattern on every event:

Query: collect all graph signals (velocity, novelty, risk context) before scoring
Score: apply your business logic using the signals
Insert: always record the event edge, regardless of the decision

💡

Always write the edge even on a declined event. The graph needs the full history to compute accurate velocity counts and risk propagation going forward.

Data Modeling in JetGraph

Core Concepts

JetGraph organizes data into two primitives:

Nodes — entities in your domain (e.g., a user, a card, a merchant, a device)
Relationships (edges) — directed connections between two nodes (e.g., CARD -[:TRANSACTS_AT]-> MERCHANT)

Each node and edge has a type (also called a label) and an optional set of properties (key-value pairs).

Graph Diagram — Fraud Detection Domain

Schema Registration

Before any data can be written, you must declare your node types and edge types. This is done once at startup via the CALL db.* system procedures, then finalized with db.finalizeSchema().

bash — full schema setup

# Register node types (second arg is the ID type: "string" or "integer")
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerNodeType(\"CARD\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}'

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerNodeType(\"MERCHANT\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}'

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerNodeType(\"DEVICE\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}'

# Register an edge type
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerEdgeType({name:\"TRANSACTS_AT\",from_node_type:\"CARD\",to_node_type:\"MERCHANT\"}) YIELD edge_type_id RETURN edge_type_id","parameters":{}}'

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerEdgeType({name:\"USES_DEVICE\",from_node_type:\"CARD\",to_node_type:\"DEVICE\"}) YIELD edge_type_id RETURN edge_type_id","parameters":{}}'

# Finalize — must be called after all types are registered
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.finalizeSchema() YIELD schema_version RETURN schema_version","parameters":{}}'

⚠️

Schema is immutable after finalization. Plan your node types and edge types carefully. In development you can wipe everything with CALL db.resetGraph() when ENABLE_ADMIN_RESET=true is set, or restart the container with a fresh volume.

Edge Types, Histograms & Activity Windows

Edge types define both the graph relationship (CARD → MERCHANT) and the feature storage kept for every edge pair. The storage layout is chosen once, when the edge type is registered. For ML/GNN use cases, the two most important read paths are: graph.histogram for node-level bucketed counts and graph.edgeState for edge-pair state.

Concept	Scope	What it stores	Typical use
Compact edge payload	One `(src, dst, edge_type)` pair	`tx_count`, `approx_sum`, `last_seen`, activity bitmap, optional 8-bin amount histogram, optional bool flag	Edge features such as count, amount sum, recency, velocity.
Activity bitmap	One edge pair	21 recent time ticks, 3 bits each. Each tick count saturates at 7.	Fast edge-level velocity windows: last 5 min, 10 min, 1 hour, etc.
Node histogram	One `(node, edge_type)` side	Two ring buffers: hourly slots and daily slots. Each slot has 8 amount/value buckets.	Node-level behaviour: amount distribution for a card over last 1h, 24h, 7d.

Registering a transaction edge with full numeric features

To get the full compact payload (numeric bins + approximate sum), register the edge type with bin_boundaries. The seven boundaries define eight buckets. tracked_property names the numeric value from ingest that is binned and summed, usually "amount" for payments.

cypher — PAYMENT edge with amount bins, ticks, and node histograms

CALL db.registerEdgeType({
  name:             "PAYMENT",
  from_node_type:   "CARD",
  to_node_type:     "MERCHANT",
  state_ttl_secs:   7776000,

  // Full CompactEdgePayload: 7 thresholds → 8 amount buckets
  bin_boundaries:   [5, 25, 50, 100, 250, 500, 1000],
  tracked_property: "amount",

  // Edge-level velocity bitmap: 21 ticks × 5 minutes = 105 minutes max lookback
  activity_bitmap:  { tick_size_secs: 300 },

  // Node-level rolling histograms. Each slot stores 8 bucket counts.
  node_histogram:   {
    enabled_for_src: true,
    enabled_for_dst: false,
    hourly_slots:    24,
    daily_slots:     7
  }
}) YIELD edge_type_id
RETURN edge_type_id

Registration field	Meaning
`bin_boundaries`	Seven numeric thresholds. They create eight buckets: `<5`, `5–25`, `25–50`, …, `≥1000`.
`tracked_property`	The numeric input field that feeds `approx_sum` and the bucket counters. For payments this is usually `amount`.
`activity_bitmap.tick_size_secs`	The duration of one edge-level activity tick. With `300`, `[1, 2, 12]` means last 5 min, 10 min, and 1 hour.
`node_histogram.hourly_slots`	How many hourly histogram slots to keep. `24` keeps 24 hours of hourly detail.
`node_histogram.daily_slots`	How many daily histogram slots to keep. `7` keeps 7 days of daily detail.

Reading node histograms

graph.histogram returns aggregated bucket counts for one node and one edge type. It is node-level: for CARD → PAYMENT, it counts all PAYMENT edges from that card, not a single merchant edge.

cypher — last 24 hours and last 7 days

MATCH (c:card {external_id: "card-velocity-09"})

// Use the hourly ring and sum the most recent 24 hourly slots.
CALL graph.histogram(c, "PAYMENT", 24)
  YIELD buckets, counts AS hourly

// Use the daily ring and sum the most recent 7 daily slots.
CALL graph.histogram(c, "PAYMENT", null, 7)
  YIELD counts AS days

RETURN c.node_id, buckets, hourly, days

Example output counts = [0, 2, 12, 2, 0, 0, 0, 0] means: 0 events below 5, 2 events in 5–25, 12 events in 25–50, 2 events in 50–100, and none in the higher buckets.

Reading edge state

graph.edgeState reads one edge pair. It is the edge-level complement to graph.histogram. Use it for per-edge features such as tx_count, approx_sum, last_seen, boolean flags, and activity windows.

cypher — specific card → merchant edge state

MATCH (c:card {external_id: "card-velocity-09"})
CALL graph.edgeState(
  c,
  "merchant:merchant-uk-10",
  "PAYMENT",
  [1, 2, 12]
) YIELD tx_count, approx_sum, last_seen, bool_flag, activity_counts
RETURN c.node_id, tx_count, approx_sum, last_seen, bool_flag, activity_counts

If PAYMENT.activity_bitmap.tick_size_secs = 300, then [1, 2, 12] asks for counts over the last 5 minutes, 10 minutes, and 1 hour. The bitmap holds at most 21 ticks, so a 5-minute tick gives about 105 minutes of edge-level activity history. Longer windows should come from node histograms.

💡

For embeddings: combine graph.histogram for node features (count distributions over 1h/24h/7d) with graph.edgeState for edge features (pair count, amount sum, recency, and short velocity windows).

Data Modeling Best Practices

Use meaningful, domain-specific labels — CARD, MERCHANT, DEVICE instead of generic names
Keep relationship types verb-like and directional — TRANSACTS_AT, USES_DEVICE, OWNED_BY
Identify entities by a stable external ID (e.g., card PAN hash, merchant ID) not internal surrogate keys
Model shared attributes as nodes, not properties — a shared device should be a DEVICE node that multiple cards point to, not a string property on each card
Avoid deeply nesting all data in properties — if you query by it, it should be a node or relationship

Querying the Graph — Cypher

JetGraph uses Cypher, the standard graph query language originally developed for Neo4j and now governed by the openCypher specification. If you know SQL, Cypher will feel natural — it uses a similar declarative style but describes patterns in the graph rather than joins between tables.

Basic Patterns

Cypher uses ASCII-art notation to express graph patterns:

(n:LABEL) — a node with a label
-[:EDGE_TYPE]-> — a directed relationship
(a)-[:EDGE]->(b) — node a connected to node b

CREATE — Insert a Node

cypher

// Create a CARD node with properties
CREATE (c:CARD {external_id: "card-001", country: "US"})
RETURN c.external_id AS id

MATCH — Query Nodes

cypher

// Find a specific card
MATCH (c:CARD {external_id: "card-001"})
RETURN c.external_id AS id, c.country AS country

// Find all cards (with limit — always paginate large result sets)
MATCH (c:CARD)
RETURN c.external_id AS id
LIMIT 100

CREATE — Insert a Relationship

cypher

// Connect a CARD to a MERCHANT
MATCH (c:CARD {external_id: "card-001"}),
      (m:MERCHANT {external_id: "merchant-42"})
CREATE (c)-[:TRANSACTS_AT {amount: 49.99, ts: 1712345678}]->(m)
RETURN true AS created

Traversal — Multi-hop Queries

cypher

// Find all merchants this card has visited
MATCH (c:CARD {external_id: "card-001"})-[:TRANSACTS_AT]->(m:MERCHANT)
RETURN m.external_id AS merchant

// Two-hop: other cards that share a device with this card
MATCH (c:CARD {external_id: "card-001"})-[:USES_DEVICE]->(d:DEVICE)
      <-[:USES_DEVICE]-(other:CARD)
WHERE other.external_id <> "card-001"
RETURN other.external_id AS related_card, d.fingerprint AS shared_device

Filtering with WHERE

cypher

MATCH (c:CARD)-[r:TRANSACTS_AT]->(m:MERCHANT)
WHERE r.amount > 500 AND m.country = "US"
RETURN c.external_id AS card, m.external_id AS merchant, r.amount
ORDER BY r.amount DESC
LIMIT 20

Parameterized Queries

Always use parameters (prefixed with $) instead of string interpolation to avoid injection and improve query plan reuse:

bash

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "MATCH (c:CARD {external_id: $card_id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant",
    "parameters": {"card_id": "card-001"}
  }'

Aggregations

cypher

// Count transactions per merchant
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
RETURN m.external_id AS merchant, COUNT(c) AS card_count
ORDER BY card_count DESC
LIMIT 10

// Sum transaction amounts per card
MATCH (c:CARD)-[r:TRANSACTS_AT]->(m:MERCHANT)
RETURN c.external_id AS card, SUM(r.amount) AS total_spend

MERGE — Upsert Nodes and Relationships

MERGE matches an existing pattern or creates it if it does not exist. Use ON CREATE SET and ON MATCH SET to set properties conditionally:

cypher

// Upsert a node — create if absent, update timestamp if it exists
MERGE (c:CARD {external_id: $card_id})
ON CREATE SET c.created_at = $ts, c.country = $country
ON MATCH SET  c.last_seen  = $ts
RETURN c.external_id, c.created_at

// Upsert a relationship between two existing nodes
MATCH (c:CARD {external_id: $card_id}), (d:DEVICE {external_id: $device_id})
MERGE (c)-[:USES_DEVICE]->(d)
RETURN true AS linked

OPTIONAL MATCH

OPTIONAL MATCH works like a left outer join — if the pattern does not exist, the variables are bound to null rather than excluding the row:

cypher

// Return card with its device fingerprint, even if no device is linked
MATCH          (c:CARD {external_id: $card_id})
OPTIONAL MATCH (c)-[:USES_DEVICE]->(d:DEVICE)
RETURN c.external_id AS card, d.external_id AS device

WITH — Pipeline and Filter Mid-Query

WITH passes results from one query stage to the next, allowing intermediate filtering, aggregation, and variable re-binding:

cypher

// Find cards with more than 5 distinct merchants, then get their devices
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
WITH c, COUNT(DISTINCT m) AS merchant_count
WHERE merchant_count > 5
MATCH (c)-[:USES_DEVICE]->(d:DEVICE)
RETURN c.external_id AS card, merchant_count, d.external_id AS device
LIMIT 50

UNWIND — Expand a List

UNWIND turns a list into individual rows, which is useful for batch operations driven by a parameter array:

cypher

// Create multiple nodes from a list parameter in one round-trip
UNWIND $card_ids AS cid
CREATE (c:CARD {external_id: cid})
RETURN c.external_id AS created

// Parameters: {"card_ids": ["card-001", "card-002", "card-003"]}

Data Manipulation

Insert Data

Use CREATE to insert new nodes and relationships. Both the source and destination nodes must already exist before creating a relationship.

bash — create node

curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CREATE (m:MERCHANT {external_id: $id, mcc: $mcc}) RETURN m.external_id","parameters":{"id":"merchant-42","mcc":"5411"}}'

Update Node Properties

Use SET to update or add properties on an existing node:

cypher

MATCH (c:CARD {external_id: "card-001"})
SET c.risk_score = 0.85, c.flagged = true
RETURN c.external_id, c.risk_score

Delete a Node

A node must have no relationships before it can be deleted. Use DETACH DELETE to remove both the node and all its relationships in one step:

cypher

// Delete node and all its relationships
MATCH (c:CARD {external_id: "card-001"})
DETACH DELETE c

Delete a Relationship

cypher

MATCH (c:CARD {external_id: "card-001"})-[r:TRANSACTS_AT]->(m:MERCHANT)
DELETE r

Bulk Inserts

For bulk data loading, fire multiple POST /cypher requests in parallel. Each request is independent and thread-safe. For maximum throughput from Rust, use the jetgraph-client crate which batches requests over a persistent gRPC connection.

ℹ️

JetGraph can ingest up to 35,000 events per second via the streaming ingestion pipeline. For high-volume onboarding, use the bulk import tooling rather than individual /cypher POST requests.

Graph Analysis & Use Cases

Velocity Counting (O(1))

JetGraph pre-computes velocity counts using ring buffers, making time-window queries instant. Query them via the Rust client or via Cypher system procedures:

rust — velocity query

// Count TRANSACTS_AT edges from this card in the last 1 hour
let count = graph
    .get_velocity_count(VelocityQuery {
        node:        card_id,
        edge_type:   "TRANSACTS_AT".into(),
        window_secs: 3600,   // 1 hour
    })
    .await?
    .count;

// 24-hour window
let daily_count = graph
    .get_velocity_count(VelocityQuery {
        node:        card_id,
        edge_type:   "TRANSACTS_AT".into(),
        window_secs: 86400,
    })
    .await?
    .count;

Fraud Detection Pattern

Graph databases are uniquely effective for fraud detection because fraud rings are defined by connections. A card that shares a device with a flagged card is suspicious — even if the card itself has no prior fraud history.

rust — fraud scoring (full example)

async fn score_transaction(
    graph: &GraphClient,
    tx: &Transaction,
) -> Result<Decision> {
    // Phase 1: collect signals
    let card_id     = graph.lookup_node("CARD",     &tx.card_id).await?;
    let merchant_id = graph.lookup_node("MERCHANT", &tx.merchant_id).await?;

    // Velocity: transactions in last 1h and 24h
    let txn_1h = graph.get_velocity_count(VelocityQuery {
        node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 3600,
    }).await?.count;

    // Novelty: is this merchant new for this card?
    let is_new_merchant = !graph
        .edge_exists(card_id, merchant_id, "TRANSACTS_AT").await?;

    // Contagion: max fraud score from 1-hop neighbours
    let ctx = graph.get_fraud_context(FraudContextQuery {
        node: card_id,
    }).await?;

    // Phase 2: score
    let mut risk: f32 = 0.0;
    if is_new_merchant      { risk += 0.15; }
    if txn_1h > 30         { risk += 0.25; }
    risk += 0.5 * ctx.max_neighbor_fraud_score;

    let decision = match risk {
        r if r > 0.7 => Decision::Decline,
        r if r > 0.4 => Decision::Challenge,
        _           => Decision::Approve,
    };

    // Phase 3: insert edge (always, even on decline)
    graph.create_edge(CreateEdgeRequest {
        edge_type_name: "TRANSACTS_AT".into(),
        src: card_id, dst: merchant_id,
        properties: vec![prop("amount", tx.amount), prop("decision", &decision)],
    }).await?;

    // Propagate fraud score if declined
    if decision == Decision::Decline {
        graph.flag_node(FlagRequest {
            node: card_id, fraud_score: 0.85,
            reason: "auto_decline".into(),
        }).await?;
    }

    Ok(decision)
}

Ring Fraud Detection (Cypher)

Find cards that share a device with a known-fraudulent card — the classic "fraud ring" pattern:

cypher

// Cards that share a device with card-001 (1-hop via DEVICE)
MATCH (seed:CARD {external_id: "card-001"})
       -[:USES_DEVICE]->(d:DEVICE)
       <-[:USES_DEVICE]-(suspect:CARD)
WHERE suspect.external_id <> "card-001"
RETURN suspect.external_id AS card, d.fingerprint AS shared_device

Recommendation System Pattern

Find merchants popular with other cards that share the same device as the current card — a graph-based collaborative filter:

cypher

MATCH (c:CARD {external_id: "card-001"})
       -[:USES_DEVICE]->(d:DEVICE)
       <-[:USES_DEVICE]-(peer:CARD)
       -[:TRANSACTS_AT]->(m:MERCHANT)
WHERE NOT (c)-[:TRANSACTS_AT]->(m)
RETURN m.external_id AS recommended_merchant, COUNT(peer) AS peer_count
ORDER BY peer_count DESC
LIMIT 5

Performance & Best Practices

Query Optimization Tips

Always anchor on a specific node first. Start your MATCH with a node that has a known external_id rather than scanning all nodes of a type.
Use velocity APIs for time-window counts. Do not use COUNT over traversals for time-windowed counts — the pre-computed O(1) velocity API is orders of magnitude faster.
Prefer shorter paths. Traversals up to 2–3 hops are fast. Deeper unbounded traversals should have a LIMIT to avoid full graph scans.
Use parameterized queries. This enables query plan reuse across calls with different values.
Limit result sets. Always add LIMIT to exploratory queries — especially in production where the graph may be large.

Efficient Traversal Patterns

Pattern	Recommended	Avoid
Count events in time window	Velocity API (`get_velocity_count`)	`COUNT` with filter over edges
Check if relationship exists	`edge_exists(src, dst, type)`	Full `MATCH` + `COUNT`
Find connected neighbours	1–2 hop `MATCH` with `LIMIT`	Unbounded variable-length paths
Get risk context	`get_fraud_context(node_id)`	Manually traversing and aggregating

Common Mistakes to Avoid

Writing data before calling db.finalizeSchema() — all writes will fail
Forgetting to call CREATE for destination nodes before creating a relationship
Using string interpolation to build Cypher queries — always use parameters
Querying without LIMIT on large result sets in production
Skipping the edge insert when a transaction is declined — this breaks future velocity counts
Modeling shared attributes (devices, IPs) as string properties instead of nodes — you lose the ability to traverse them

Cypher Best Practices

These rules are derived from direct analysis of the JetGraph query planner logs. Each one maps to a specific engine optimisation — or the absence of one — that has measurable impact at scale. Follow them to keep every query O(1) or close to it.

1. Always label every node in MATCH

The pushdown optimizer converts a WHERE node.external_id = $value filter into an O(1) IndexLookup only when it knows the node type. Without a label the engine produces NodeScan { node_type: None } — a full scan across every node in the graph, repeated once per edge type registered in the schema.

cypher — ✗ Avoid

// No label on (s) → full graph scan × number of edge types
MATCH (s)-[r]->(d)
WHERE s.external_id = $id
RETURN s, r, d

cypher — ✓ Prefer

// Typed nodes → pushdown rewrites NodeScan to O(1) IndexLookup
MATCH (s:CARD)-[r:TRANSACTS_AT]->(d:MERCHANT)
WHERE s.external_id = $id
RETURN s.external_id, d.external_id

2. Always use parameters — never embed literal values

Literal values baked into the query string each create a unique plan cache key. Every new literal triggers a full parse + plan cycle regardless of how many times a similar query has been run before. Parameters collapse every variation of a query into a single cached plan that is reused for every card ID, merchant ID, device fingerprint, or IP address.

cypher — ✗ Avoid

// Each card number is a separate cache key — replanning on every call
MATCH (c:CARD {external_id: "9792487647826207"})-[:TRANSACTS_AT]->(m:MERCHANT)
RETURN m.external_id AS merchant

cypher — ✓ Prefer

// One plan, cached forever — value supplied at runtime via parameters
MATCH (c:CARD {external_id: $card_id})-[:TRANSACTS_AT]->(m:MERCHANT)
RETURN m.external_id AS merchant

💡

This applies equally to device fingerprints, IP addresses, customer IDs, and every other lookup value. Any hardcoded string in the query string is a separate cache entry.

3. Filter on `external_id` equality on the source node

The pushdown optimizer walks the plan tree and converts Filter(src.external_id = value) → NodeScan into an IndexLookup. This rewrite only fires when the filter is an equality on external_id and applies to the source variable of the expand. Filters on destination properties or on any other property remain as post-expansion filters (slower).

cypher — ✗ Avoid

// No anchor on the source — engine scans all CARDs then filters merchants
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
WHERE m.name CONTAINS 'Airline'
RETURN c.external_id, m.name

cypher — ✓ Prefer

// Anchor on source external_id → pushdown fires → IndexLookup for the card
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
WHERE c.external_id = $card_id
  AND m.name CONTAINS 'Airline'
RETURN c.external_id, m.name

// If you need to start from the merchant side, flip the traversal direction
MATCH (m:MERCHANT)<-[:TRANSACTS_AT]-(c:CARD)
WHERE m.external_id = $merchant_id
RETURN c.external_id

4. Always type every relationship

An untyped relationship -[r]-> causes the engine to fan out the expand across every edge type registered in the schema — one full expand per type. With six edge types registered, a single untyped MATCH compiles into six separate query plans. Always name the relationship type explicitly.

cypher — ✗ Avoid

// Untyped [r] → engine fans out over EdgeTypeId(0), (1), (2) … for every type
MATCH (c:CARD)-[r]->(n)
WHERE c.external_id = $card_id
RETURN type(r), n.external_id

cypher — ✓ Prefer

// Single explicit edge type → one targeted expand, one IndexLookup
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
WHERE c.external_id = $card_id
RETURN m.external_id

// Need multiple edge types? Use UNION ALL — each branch is individually optimized
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE c.external_id = $card_id
RETURN 'merchant' AS kind, m.external_id AS neighbor
UNION ALL
MATCH (c:CARD)-[:USES_DEVICE]->(d:DEVICE)   WHERE c.external_id = $card_id
RETURN 'device'   AS kind, d.external_id AS neighbor
UNION ALL
MATCH (c:CARD)-[:USES_IP]->(ip:IP)           WHERE c.external_id = $card_id
RETURN 'ip'       AS kind, ip.external_id AS neighbor

5. Canonical fraud context — the recommended multi-hop template

This is the recommended pattern for pulling the full risk context of a card in a single round-trip: typed nodes, typed relationships, parameterized, one plan cached for every card. Each UNION ALL branch is compiled and optimized independently.

cypher — ✓ Canonical fraud context query

MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
WHERE c.external_id = $card_id
RETURN 'merchant' AS kind, m.external_id AS id, m.name AS label
UNION ALL
MATCH (c:CARD)-[:USES_DEVICE]->(d:DEVICE)
WHERE c.external_id = $card_id
RETURN 'device' AS kind, d.external_id AS id, '' AS label
UNION ALL
MATCH (c:CARD)-[:USES_IP]->(ip:IP)
WHERE c.external_id = $card_id
RETURN 'ip' AS kind, ip.external_id AS id, '' AS label

6. Anchor ring / co-occurrence queries on the known node

Shared-entity ring queries (cards sharing a device or IP) must start from a known, typed anchor. An unanchored ring causes a full scan of all devices before expanding to cards. Start from the entity you know.

cypher — ✗ Avoid

// No anchor → scans all DEVICE nodes, expands to all cards, then filters
MATCH (d:DEVICE)<-[:USES_DEVICE]-(c:CARD)
WHERE c.country = 'US'
RETURN d.external_id, c.external_id

cypher — ✓ Prefer

// Start from a known card → IndexLookup → expand to its devices → expand to sibling cards
MATCH (c1:CARD)-[:USES_DEVICE]->(d:DEVICE)<-[:USES_DEVICE]-(c2:CARD)
WHERE c1.external_id = $card_id
  AND c2.external_id <> $card_id
RETURN DISTINCT c2.external_id AS linked_card, d.external_id AS shared_device
LIMIT 50

// Or start from a known device
MATCH (d:DEVICE)<-[:USES_DEVICE]-(c:CARD)
WHERE d.external_id = $device_id
RETURN c.external_id AS linked_card
LIMIT 50

7. Always add LIMIT to traversals and scans

The engine propagates LIMIT hints down into NodeScan and VariableExpand nodes so BFS stops as soon as enough rows are found. Without a limit, traversals materialise the full result set before returning anything. This is especially important for variable-length paths.

cypher — ✓ Prefer

// LIMIT is pushed into the expand — BFS stops after finding 50 merchants
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT)
WHERE c.external_id = $card_id
RETURN m.external_id, m.name
LIMIT 50

// Variable-length traversal: cap both hop count and result rows
MATCH (c:CARD)-[:TRANSACTS_AT*1..3]->(m:MERCHANT)
WHERE c.external_id = $card_id
RETURN DISTINCT m.external_id
LIMIT 100

8. Use `db.nodeStats()` for counts, not `MATCH (n:TYPE)`

A bare MATCH (n:CARD) RETURN count(n) materialises every node in that type before counting. db.nodeStats() reads a pre-maintained O(1) counter with no scan.

cypher — ✗ Avoid

// Materialises all nodes in the type before counting
MATCH (n:CARD) RETURN count(n) AS total

cypher — ✓ Prefer

// O(1) pre-computed counter — no scan
CALL db.nodeStats() YIELD type, count
WHERE type = 'CARD'
RETURN count

// If you must scan for exploration, always cap with LIMIT
MATCH (n:CARD) RETURN n.external_id LIMIT 25

9. Batch writes with `graph.ingest()`

Individual CREATE statements are planned and executed one at a time. For loading multiple nodes and edges in a single operation, use graph.ingest() — one round-trip, one plan, one atomic batch write regardless of how many entities are included.

cypher — ✓ Prefer for batch loads

CALL graph.ingest(
  [
    { node_type: 'CARD',     externalId: $card_id },
    { node_type: 'MERCHANT', externalId: $merchant_id }
  ],
  [
    { edge_type: 'TRANSACTS_AT', src: $card_id, dst: $merchant_id }
  ]
) YIELD ok, nodes_created, edges_created
RETURN ok, nodes_created, edges_created

10. Record transaction events with `graph.upsertEdge()`

For high-throughput edge upserts — recording payment events — use graph.upsertEdge() rather than MERGE-based patterns. It is a single O(1) procedure call that increments counters and updates the activity bitmap atomically, with no planner involved.

cypher — ✓ Prefer for event recording

CALL graph.upsertEdge(
  $edge_type,
  $src_typed_id,
  $dst_typed_id
) YIELD created_new, tx_count, approx_sum
RETURN created_new, tx_count, approx_sum

// Parameters example:
// { "edge_type": "TRANSACTS_AT",
//   "src_typed_id": "CARD:9792487647826207",
//   "dst_typed_id": "MERCHANT:M2_0000803" }

Quick Reference

#	Rule	Engine impact avoided
1	Label every node: `(c:CARD)` not `(c)`	Eliminates full-graph `NodeScan { node_type: None }`
2	Use `$parameters`, never inline literals	One cached plan per query shape instead of one per value
3	Filter `src.external_id = $value` on the source node	Triggers pushdown → O(1) `IndexLookup`
4	Type every relationship: `[:TRANSACTS_AT]` not `[r]`	Prevents N-way fan-out (one expand per edge type)
5	Use the UNION ALL fraud context template	One cached plan, all neighbours, one round-trip
6	Anchor ring queries on the known node	Avoids full DEVICE / IP scan in co-occurrence queries
7	Always add `LIMIT` to traversals	BFS stops early; limit is pushed into the expand node
8	Use `db.nodeStats()` for counts	O(1) counter vs full node materialisation
9	Batch writes with `graph.ingest()`	Single round-trip; bypasses per-statement planning
10	Edge events with `graph.upsertEdge()`	O(1) atomic counter update; no planner involved

Error Handling & Debugging

Common Errors

Error	Cause	Fix
`Schema not finalized`	Writing data before `db.finalizeSchema()`	Call `CALL db.finalizeSchema()` after all type registrations
`Unknown node type`	Using a label not registered in the schema	Register the type with `db.registerNodeType()` before finalizing
`Unknown edge type`	Using a relationship type not registered	Register with `db.registerEdgeType()` before finalizing
`Node not found`	Creating an edge to a node that doesn't exist	Always `CREATE` the destination node before creating an edge to it
`Duplicate node type`	Registering the same type twice	Each type name must be unique — restart with a fresh volume if needed in dev
Connection refused on port 8080	Container not started or health check failing	Check `docker compose logs graphengine` for startup errors

Debugging with Logs

bash

# Follow live logs from the graph engine
docker compose logs -f graphengine

# Increase verbosity (set RUST_LOG=debug in docker-compose.yml)
# Valid values: error | warn | info | debug | trace

Testing Connectivity

bash

# Health endpoint
curl -v http://localhost:8080/health

# Verify Bolt port is open
nc -zv localhost 7687

# Simple Cypher round-trip
curl -sS -X POST http://localhost:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"RETURN 1 AS ping","parameters":{}}'
# → {"columns":["ping"],"rows":[[1]]}

Persistence & Durability

JetGraph is an in-memory engine — all data lives in RAM for sub-millisecond access. Durability is provided by a layered persistence model that writes to disk asynchronously without pausing query processing.

How Data Survives Restarts

📸

Full Snapshots

Complete graph serialized to disk nightly (default 02:30 AM) or on a configurable interval. Compressed with zstd.

⚡

Delta Files

Incremental changes written every 30–60 seconds. Applied on top of the base snapshot at startup for fast recovery.

🔗

Checkpoints

When the delta chain grows long, deltas are merged offline into a single checkpoint file — no engine pause required.

🛡️

Shutdown Delta

On graceful stop, a final delta is written before the full snapshot — so recent changes are safe even if the snapshot is interrupted.

Recovery Sequence at Startup

1

Load latest full snapshot

The most recent snapshot-*.bin file is deserialized into memory. The engine is marked ready immediately so queries can start while the Cuckoo filter rebuilds in the background.
2

Apply checkpoint (if any)

If a checkpoint file exists for this snapshot base, it is applied first to skip replaying the earliest deltas.
3

Replay remaining delta files

Any delta files newer than the checkpoint are applied in order, bringing the graph fully up to date.

ℹ️

The engine returns 503 from /health while loading a snapshot. Polls will succeed once loading completes — typically within seconds for small graphs, longer for very large ones. This is why start_period in the health check should be set generously.

Emergency Snapshot (Memory Pressure)

When RSS memory exceeds memory_limit_bytes in config.toml, the engine automatically writes an emergency full snapshot, blocks new ingest, and logs an error. This protects data before the container OOM-killer fires. Ingest resumes once memory drops back below the limit.

Manual Snapshot

Trigger a snapshot on demand via Cypher without restarting:

cypher

CALL graph.saveSnapshot() YIELD ok, path
RETURN ok, path

Graceful Shutdown

Send SIGTERM (what docker compose stop / docker compose down sends) and the engine will write a final delta then a full snapshot before exiting. The stop_grace_period: 300s in the compose file gives it up to 5 minutes for very large graphs. Never use SIGKILL directly — it bypasses the shutdown snapshot.

Clustering & High Availability

JetGraph supports primary–standby clustering for high availability and read scale-out. The primary serves reads and writes; one or more standbys keep a live in-memory replica of the graph and serve reads only. Replication is delta-based and lag is typically under 35 seconds under normal ingest.

Architecture

A cluster is built from three small components running alongside each graphengine process:

🖥️

graphengine (primary)

Accepts ingest. Writes a full snapshot nightly and a delta file every 30 seconds.

📤

delta-replicator

Sidecar on the primary. Ships each new snapshot and delta to the standby via rsync over SSH within a few seconds.

🧩

delta-compactor

Sidecar on both nodes. Merges long chains of raw deltas into checkpoints so restart replay stays fast.

📥

graphengine (standby)

Runs with STANDBY_MODE=true. Polls for new delta files every 5 s and applies them to the live graph. Hot-reloads on new nightly snapshots without restarting.

┌────────────── PRIMARY ──────────────┐        ┌────────────── STANDBY ──────────────┐
│ graphengine (read + write)          │        │ graphengine   STANDBY_MODE=true     │
│   writes snapshot-*.bin  delta-*.bin│        │   applies deltas, hot-reloads       │
│ delta-replicator ───── rsync/ssh ────────────▶ /opt/graphengine/data/snapshots    │
│ delta-compactor                     │        │ delta-compactor                     │
└─────────────────────────────────────┘        └─────────────────────────────────────┘
             ≤ 30 s produce + ≈ 5 s ship   =   ≤ 35 s replication lag

What Gets Replicated

Schema — snapshots include the finalized schema; the standby reloads automatically when a new snapshot arrives.
Nodes and edges — every CREATE, MERGE, SET, DELETE, and graph.upsertEdge write produces dirty pairs that flow in the next delta file.
Edge state — velocity counters, activity bitmaps, and histograms are snapshot-bound and travel with the data.
Fraud flags — graph.flagNode / graph.unflagNode propagate through the normal delta channel.

Segment Evaluator and Pattern Miner sidecars maintain their own config stores; they should run on both hosts but do not participate in graph replication.

Replication Lag

Lag = delta_interval_secs (30 s default) + POLL_INTERVAL_SECS (5 s default) = ~35 seconds upper bound under normal load. For tighter RPO, lower both intervals at the cost of more but smaller delta files. Data loss on an unplanned primary failure is bounded by this window.

Minimum Compose Setup

Clusters use two Compose files. The primary runs the engine plus the replicator sidecar; the standby runs the engine with STANDBY_MODE enabled.

yaml — primary (key services)

services:
  graphengine:
    image: ghcr.io/fraudmanagement/graphengine:main
    environment:
      ENABLE_ADMIN_RESET: "true"
      CONFIG_PATH: /config/config.toml
      MALLOC_CONF: "background_thread:true,dirty_decay_ms:1000,muzzy_decay_ms:10000,narenas:16"
    volumes:
      - /opt/graphengine/data:/data
      - ./config.toml:/config/config.toml:ro

  delta-replicator:
    image: ghcr.io/fraudmanagement/graphengine:main
    command: ["/usr/local/bin/delta-replicator"]
    environment:
      SNAPSHOT_DIR: /data/snapshots
      REPLICA_REMOTE: "${STANDBY_USER}@${STANDBY_HOST}:/opt/graphengine/data/snapshots/"
      REPLICA_SSH_KEY: /keys/replication.key
    volumes:
      - /opt/graphengine/data:/data
      - ./replication.key:/keys/replication.key:ro

yaml — standby (key service)

services:
  graphengine:
    image: ghcr.io/fraudmanagement/graphengine:main
    environment:
      STANDBY_MODE: "true"
      POLL_INTERVAL_SECS: "5"
      CONFIG_PATH: /config/config.toml
    volumes:
      - /opt/graphengine/data:/data
      - ./config.toml:/config/config.toml:ro

ℹ️

Port 22 (SSH) on the standby host must be reachable from the primary host — delta-replicator rsyncs over SSH directly, not through the Docker network.

Failover

1

Drain and stop the primary

Stop new writes at the load-balancer level, then docker compose stop graphengine. On SIGTERM the engine writes a final delta and full snapshot — up to stop_grace_period (default 300 s).
2

Wait for the standby to drain the delta queue

Tail the standby logs until no more applying delta lines appear. At that point the two engines are byte-for-byte equivalent.
3

Promote the standby

Restart the engine with STANDBY_MODE removed (or swap in the primary compose file on that host). The data directory is already a full mirror — no migration step.
4

Cut clients over

Update DNS or load-balancer config. Optionally reconfigure the old primary as the new standby — point its STANDBY_HOST at the new primary and install the new replication key.

For an unplanned failover (primary host lost), promote the standby immediately. Data loss is bounded by the last delta the standby applied — at most ≈ 35 seconds.

Verifying Replication

bash

# Compare node counts on both endpoints — should match or differ by one delta
curl -sS -X POST http://PRIMARY:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.nodeStats() YIELD type, count RETURN type, count"}'

curl -sS -X POST http://STANDBY:8080/cypher \
  -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.nodeStats() YIELD type, count RETURN type, count"}'

# Watch the replicator ship files and the standby apply them
docker logs -f delta-replicator | grep shipping
docker logs -f graphengine     | grep "applying delta"

Operational Limits

Standbys are read-only. All writes must go to the primary; any write attempted on a standby returns an error.
Standbys must not evict independently. Set pressure_eviction_batch_size = 0 on standbys so the replica only deletes edges that arrive in the primary's delta stream.
Size the standby like the primary. It holds the same in-memory graph — size mem_limit identically and use the same (or slightly lower) memory_limit_bytes.
No cascading replication. Topology is primary-to-standby. Chained standbys are not supported.

📖

For step-by-step production setup — SSH key generation, full memory sizing tables up to 300 GiB, and every troubleshooting playbook — see the Clustering Administrator Guide shipped with the repository at docs/clustering-admin-guide.md.

Memory & Compression

JetGraph keeps the entire graph in RAM to guarantee sub-millisecond reads, so every byte is engineered. The engine uses several compression and compaction techniques — trading a few cycles for a much smaller footprint — so a single 96 GiB host can hold hundreds of millions of nodes and billions of edges.

Where Memory Goes

Use db.memoryUsage() for a live, fully-accounted breakdown, or GET /admin/memory for the same payload in JSON:

cypher

CALL db.memoryUsage()
YIELD total_bytes, payload_bytes, edge_pair_count, process_rss_bytes, breakdown
RETURN total_bytes, payload_bytes, edge_pair_count, process_rss_bytes, breakdown

Field	What it measures
`payload_bytes`	Pure graph data — nodes, edges, properties. Lower bound.
`total_bytes`	Payload + structural overhead (indirection, shards, stacks, slack). Aligns with RSS within a few percent on Linux.
`breakdown.compact_store_bytes`	Edge adjacency — the dominant term at scale.
`breakdown.pointer_indirection_bytes`	Internal `NodeId ↔ external_id` mapping.
`breakdown.slot_table_bytes`	Slot allocator overhead for the chunk arena.
`process_rss_bytes`	Kernel-reported process RSS (what Docker and the OOM killer see).

Per-Pair Edge Compression

Each edge type stores adjacency with a payload variant chosen at registration. Pick the smallest variant that still answers your queries:

Variant	~Bytes / pair	Keeps	Use for
Full (default)	~52 B	`tx_count`, `approx_sum`, `last_seen`, 21-tick activity bitmap, 8-bin histogram, optional bool flag	Event edges where you query velocity, amounts, and activity windows.
Slim	~32 B	Same as Full minus the per-bucket histogram.	High-cardinality event edges where histograms are not needed.
Static	~16 B	`value` + `last_seen` only. No inverse index maintained.	Structural edges (`USES_DEVICE`, `SIMILAR_TO`) where only existence or a score matters.

At 1 B edges, choosing Static over Full for a structural edge type saves ≈ 36 GiB of RSS — and avoids maintaining an inverse index that is never queried.

cypher — register a static edge type

CALL db.registerEdgeType({
  name:           "USES_DEVICE",
  from_node_type: "CARD",
  to_node_type:   "DEVICE",
  is_static:      true
}) YIELD edge_type_id
RETURN edge_type_id

Other In-Memory Compression

🗂️

Dictionary-Encoded Strings

Recurring string values (MCCs, country codes, merchant categories) are stored once and referenced by a small integer ID.

🔢

Bit-Packed Booleans

Boolean flags (fraud, verified, active) are packed into bitmaps rather than using one byte per value.

🌲

Adaptive Inverse Index

Each destination's inverse row starts as a sorted Vec<NodeId> and is promoted to a BTreeSet only when fan-in crosses 10 K — giving cold rows tight cache layout and hot rows O(log N) writes.

📦

Chunked Rows

Adjacency rows are split into fixed-size chunks so writers clone only the affected chunk (not the whole row) — shrinking write amplification by ~170× on hot source nodes.

On-Disk Compression

All snapshots, deltas, and checkpoints are written through zstd. Hot-path writers use level 1 for low CPU overhead; the offline compactor uses the ultra-fast -1 level.

Artifact	Codec	Typical size vs raw payload
`snapshot-<ts>.bin`	zstd level 1	3–6× smaller
`delta-<ts>-<seq>.bin`	zstd level 1	4–8× smaller
`checkpoint-<ts>.bin`	zstd level -1	Similar to snapshot — optimised for write speed, not ratio

Disk throughput is rarely the bottleneck — delta throughput is CPU-bound on the zstd encoder, not on disk I/O. On NVMe you can ingest 30 MB deltas every 30 seconds with RSS growth matching payload growth within a few percent.

Memory-Pressure Protection

JetGraph never exceeds its configured memory budget silently. memory_limit_bytes in config.toml is the engine's RSS soft limit (set it below the container's mem_limit). When RSS crosses the limit the engine takes one of two actions based on pressure_eviction_batch_size:

Strategy	Setting	Behaviour
Backpressure (default)	`pressure_eviction_batch_size = 0`	Blocks new ingest with `503 / RESOURCE_EXHAUSTED`, writes an emergency snapshot, and resumes automatically once RSS drops. No data is lost.
Oldest-edge eviction	`pressure_eviction_batch_size = 50_000`	Deletes the oldest edges across all types, proportional to per-type growth, until RSS falls below the limit. Ingest is not blocked — but old history is dropped.

⚠️

Never enable pressure eviction on a standby. A replica must only delete what arrives in the primary's delta stream; otherwise the two engines diverge.

Sizing Memory

Size memory_limit_bytes from Docker mem_limit, not from memswap_limit. Swap is an emergency cushion — paging graph pages catastrophically regresses query latency. Start at 70–75 % of mem_limit and only raise after seeing stable headroom in docker stats and db.memoryUsage.

Host RAM	mem_limit	Primary memory_limit_bytes (~75%)	Standby memory_limit_bytes (~70%)
16 GiB	`12g`	`9_663_676_416`	`9_019_432_960`
32 GiB	`24g`	`19_327_352_832`	`18_038_865_920`
64 GiB	`52g`	`41_876_111_360`	`39_084_369_920`
96 GiB	`80g`	`64_424_509_440`	`60_129_542_144`
300 GiB	`256g`	`206_158_430_208`	`192_414_534_860`

jemalloc Tuning

JetGraph ships with jemalloc as the system allocator because glibc's default fragments badly under high-churn graph writes. Recommended MALLOC_CONF in the container environment:

yaml

environment:
  MALLOC_CONF: "background_thread:true,dirty_decay_ms:1000,muzzy_decay_ms:10000,narenas:16"

⛔

Do not add metadata_thp:auto or percpu_arena:percpu — both inflate jemalloc's internal metadata by 1–2 GiB and obscure the engine's real RSS. The recommended string is the tested production default.

Watching Memory Metrics

Scrape GET /metrics with any Prometheus-compatible collector:

jetgraph_memory_pressure — 1 when RSS ≥ memory_limit_bytes, 0 otherwise. Alert on sustained 1.
jetgraph_rcu_retries_total — RCU retries on the compact neighbor store. A steady climb indicates hot-row write contention — consider sharding by a higher-cardinality source or moving the edge type to is_static.
jetgraph_inverse_lock_contentions_total — lock contention on the inverse index. Climbs when a destination node crosses 10 K in-neighbours and the adaptive index promotes to a BTreeSet.

Procedures Reference

All procedures are invoked via CALL procedure.name(args) YIELD col1, col2 RETURN ... over any connection (REST, Bolt, or gRPC).

Schema Procedures (`db.*`)

Procedure	Arguments	Yields	Description
`db.registerNodeType`	`name, id_kind`	`node_type_id`	Register a node type. `id_kind` is `"string"` or `"integer"`.
`db.registerEdgeType`	`{name, from_node_type, to_node_type, …}`	`edge_type_id`	Register an edge type. Optional fields include `bin_boundaries`, `tracked_property`, `activity_bitmap.tick_size_secs`, `node_histogram`, `minimal_payload`, `bool_property`, and `symmetric`.
`db.registerProperty`	`name, value_type`	`property_id`	Register a named property. Types: `"int"`, `"float"`, `"string"`, `"bool"`, `"timestamp"`.
`db.finalizeSchema`	—	`schema_version`	Lock the schema. Must be called before any data writes.
`db.schema`	—	`node_types, edge_types, properties`	Introspect the current schema definition.
`db.nodeStats`	—	`type, count`	O(1) node count per type — use instead of `MATCH (n:TYPE) RETURN count(n)`.
`db.memoryUsage`	—	`total_bytes, payload_bytes, edge_pair_count, breakdown, process_rss_bytes, …`	Full in-process memory model: `total_bytes` sums payload and structural terms (indirection, stacks, slack) and should align with `process_rss` on Linux. `payload_bytes` is the schema data lower bound. High RSS at large scale reflects partitions × edge types × directions in the edge stores, not a discrepancy to "debug" away with allocator settings alone; reducing footprint means layout or partitioning changes.
`db.resetGraph`	—	`ok`	Wipe all data and schema. Only available when `ENABLE_ADMIN_RESET=true`.

Graph Procedures (`graph.*`)

Procedure	Description
`graph.ingest(nodes, edges)`	Batch upsert of nodes and edges in a single round-trip. Returns `ok, nodes_created, edges_created`.
`graph.upsertEdge(edge_type, src_typed_id, dst_typed_id)`	O(1) atomic edge upsert — increments tx_count and updates activity bitmap. Returns `created_new, tx_count, approx_sum`.
`graph.edgeState(src, dst, edge_type, windows?)`	Returns one edge pair's state: `tx_count`, `approx_sum`, `last_seen`, `bool_flag`, and `activity_counts`. The optional `windows` list is expressed in activity ticks.
`graph.lastNeighbor(node_typed_id, edge_type)`	Returns the most recently seen neighbor — useful for impossible-travel and last-location detection.
`graph.fraudContext(node_typed_id)`	Returns connected fraud nodes, max neighbor fraud score, and propagation depth.
`graph.flagNode(node_typed_id, score, reason)`	Mark a node as fraudulent and propagate the score to neighbors.
`graph.unflagNode(node_typed_id)`	Remove the fraud flag from a node.
`graph.histogram(node, edge_type, hours?, days?)`	Returns node-level histogram `buckets` and aggregated `counts`. Pass `hours` to use the hourly ring, or `null, days` to use the daily ring.
`graph.featureVector(node, edge_types, hours?, days?)`	Returns a compact flat vector of neighbour counts per edge type. For richer structured ML features, combine `graph.histogram`, `graph.edgeState`, or use the REST `/features/vector` endpoint.
`graph.findSimilar(node_typed_id, k)`	Jaccard-based k-nearest-neighbor lookup using the `SIMILAR_TO` edge index.
`graph.buildSimilarityGraph(edge_type, k)`	Batch-compute similarity edges for all nodes of a given edge type.
`graph.deleteNode(node_typed_id)`	Delete a node and detach all its relationships.
`graph.clearEdgeTypeData(edge_type)`	Remove all edges of a given type without touching nodes.
`graph.saveSnapshot()`	Trigger a full snapshot immediately. Returns `ok, path`.

Edge Type Variants

When registering edge types, the storage variant is determined by registration fields. Choose the smallest payload that still answers your feature queries:

Variant	Trigger	Physical payload	Features	When to use
Full / numeric	`bin_boundaries` present	`CompactEdgePayload` (~36 B payload)	`tx_count`, `approx_sum`, `last_seen`, 21-tick activity bitmap, 8 numeric bins, optional bool flag	Transaction edges where you query velocity, amounts, and amount buckets.
Slim	No `bin_boundaries`, not minimal	`SlimEdgePayload` (~16 B payload)	`tx_count`, `last_seen`, 21-tick activity bitmap, optional bool flag. No amount bins or `approx_sum`.	High-cardinality event edges where you need counts/velocity but not amount histograms.
Static	`minimal_payload: true`	`StaticEdgePayload` (~8 B payload)	`value` + `last_seen` only	Structural or derived-score edges such as `SIMILAR_TO`.

cypher — register a static edge type

// Static edges use an ~8 B payload — ideal for derived scores and similarity links
CALL db.registerEdgeType({
  name:            "SIMILAR_TO",
  from_node_type:  "CARD",
  to_node_type:    "CARD",
  minimal_payload: true,
  symmetric:       true
}) YIELD edge_type_id
RETURN edge_type_id

Metrics

JetGraph exposes a Prometheus-compatible metrics endpoint at GET /metrics on port 8080. Scrape it with any Prometheus-compatible collector or read it directly:

bash

curl http://localhost:8080/metrics

Metric	Description
`jetgraph_ingest_total`	Total ingest transactions processed
`jetgraph_cypher_queries_total`	Total Cypher queries executed (HTTP + Bolt + gRPC)
`jetgraph_grpc_requests_total`	Total gRPC calls by method
`jetgraph_rcu_retries_total`	RCU (read-copy-update) retries on the compact neighbor store — high values indicate write contention
`jetgraph_memory_pressure`	1 when RSS exceeds `memory_limit_bytes`, 0 otherwise
`jetgraph_inverse_lock_contentions_total`	Lock contention on the inverse neighbor index

Segment Evaluator

The Segment Evaluator is a sidecar service that sits in front of the graph engine's ingest path. It evaluates configurable segment rules in real time on every transaction — automatically assigning entities to segments like "High Velocity", "New Merchant Risk", or "Fraud Ring Adjacent" based on graph signals.

ℹ️

The Segment Evaluator runs as a separate container (segment-evaluator) in the stack. It connects to the graph engine over gRPC and exposes its own HTTP API on port 8081.

How It Works

You define signals (individual graph metrics — velocity, novelty, fraud proximity) and segments (named groups with threshold rules over signals)
On every POST /ingest/evaluate call, the evaluator ingests the transaction into the graph engine and re-evaluates which segments the entity belongs to
Segment membership is stored as MEMBER_OF static edges to a Segment node — queryable via standard Cypher
Segments persist in a sled key-value store at /data/seg-config and survive restarts
Configuration can be managed via the API or by editing TOML files (seeded on first boot)

Key API Endpoints (port 8081)

Method	Path	Description
`POST`	`/ingest/evaluate`	Ingest a transaction and re-evaluate segments for the entities involved
`GET`	`/segments`	List all defined segments
`GET`	`/segments/:name/members`	List all current members of a segment
`POST`	`/segments/simulate`	Dry-run: evaluate signals for an entity without writing segment membership
`POST`	`/segments/sweep`	Re-evaluate all entities against all segments (useful after rule changes)
`GET`	`/segments/signals`	List all configured signals
`POST`	`/segments/node/lookup`	Look up which segments a specific entity belongs to
`GET/POST/DELETE`	`/config/signals`	Manage signal definitions
`GET/POST/DELETE`	`/config/segments`	Manage segment definitions
`POST`	`/segments/config/reload`	Hot-reload config from TOML files without restarting
`GET`	`/health`	Health check

Querying Segments via Cypher

Once segments are assigned, you can query them like any other graph relationship:

cypher

// Find all cards in the "High Velocity" segment
MATCH (c:CARD)-[:MEMBER_OF]->(s:Segment)
WHERE s.external_id = "High Velocity"
RETURN c.external_id AS card
LIMIT 100

Pattern Miner

The Pattern Miner is a sidecar service that watches the graph engine's edge stream in real time and builds behavioral transition patterns — sequences of entities a node visited over time. These patterns power next-location prediction, impossible-travel detection, and anomaly scoring.

ℹ️

The Pattern Miner runs as a separate container (pattern-miner) and exposes its API on port 8082. It subscribes to the graph engine's WatchEdgeUpserts CDC stream over gRPC.

How It Works

You define rules that watch a specific edge type (e.g., USES_IP)
For each new edge event, the miner calls graph.lastNeighbor to find the previous entity of that type, then writes a transition edge (e.g., NEXT_IP) capturing the sequence
Transition contexts accumulate in an in-memory PatternStore and are persisted in a sled store at /data/pm-config
The /patterns/predict/:node_id endpoint returns the most probable next entity based on historical transitions

Key API Endpoints (port 8082)

Method	Path	Description
`GET`	`/patterns/transitions`	List all tracked transition pairs across all rules
`GET`	`/patterns/predict/:node_id`	Predict the most likely next entity for a given node based on historical transitions
`GET`	`/patterns/path/:node_id`	Return the full transition path (sequence of entities) for a node
`GET`	`/patterns/context/:m1/:m2`	Return the transition context between two specific entities
`GET/POST`	`/config/rules`	List or create pattern rules (which edge type to watch)
`DELETE`	`/config/rules/:watch_edge`	Remove a rule
`GET/POST`	`/config/settings`	Manage global miner settings
`POST`	`/patterns/config/reload`	Hot-reload rules from TOML without restarting
`GET`	`/health`	Health check

Impossible-Travel Detection Example

bash

# Get the transition path for a card (sequence of IPs it has used)
curl http://localhost:8082/patterns/path/CARD:card-001

# Predict the next likely IP for this card
curl http://localhost:8082/patterns/predict/CARD:card-001

Code Examples Reference

Copy-paste-ready snippets for common operations.

REST API — Full Workflow

bash — full REST workflow

BASE=http://localhost:8080

### 1. Register schema
curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerNodeType(\"CARD\",\"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}'

curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerNodeType(\"MERCHANT\",\"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}'

curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.registerEdgeType({name:\"TRANSACTS_AT\",from_node_type:\"CARD\",to_node_type:\"MERCHANT\"}) YIELD edge_type_id RETURN edge_type_id","parameters":{}}'

curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"CALL db.finalizeSchema() YIELD schema_version RETURN schema_version","parameters":{}}'

### 2. Create nodes
curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"CREATE (c:CARD {external_id:$id}) RETURN c.external_id","parameters":{"id":"card-001"}}'

curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"CREATE (m:MERCHANT {external_id:$id}) RETURN m.external_id","parameters":{"id":"merchant-42"}}'

### 3. Create relationship
curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"MATCH (c:CARD {external_id:$cid}),(m:MERCHANT {external_id:$mid}) CREATE (c)-[:TRANSACTS_AT {amount:$amt}]->(m) RETURN true","parameters":{"cid":"card-001","mid":"merchant-42","amt":49.99}}'

### 4. Query the graph
curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \
  -d '{"query":"MATCH (c:CARD {external_id:$id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant","parameters":{"id":"card-001"}}'

Bolt — Python

python

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("", ""))

def get_merchants_for_card(tx, card_id):
    result = tx.run(
        "MATCH (c:CARD {external_id: $id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant",
        id=card_id
    )
    return [r["merchant"] for r in result]

with driver.session() as session:
    merchants = session.execute_read(get_merchants_for_card, "card-001")
    print(merchants)

driver.close()

Rust Client — Complete Scoring Loop

rust

use jetgraph_client::{
    GraphClient, CreateEdgeRequest, VelocityQuery, FraudContextQuery, FlagRequest, prop,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let graph = GraphClient::connect("http://localhost:50051").await?;

    let card_id     = graph.lookup_node("CARD",     "card-001").await?;
    let merchant_id = graph.lookup_node("MERCHANT", "merchant-42").await?;

    let txn_1h = graph.get_velocity_count(VelocityQuery {
        node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 3600,
    }).await?.count;

    let is_new = !graph.edge_exists(card_id, merchant_id, "TRANSACTS_AT").await?;
    let ctx    = graph.get_fraud_context(FraudContextQuery { node: card_id }).await?;

    let mut risk: f32 = 0.0;
    if is_new    { risk += 0.15; }
    if txn_1h > 30 { risk += 0.25; }
    risk += 0.5 * ctx.max_neighbor_fraud_score;

    graph.create_edge(CreateEdgeRequest {
        edge_type_name: "TRANSACTS_AT".into(),
        src: card_id, dst: merchant_id,
        properties: vec![prop("risk", risk)],
    }).await?;

    println!("Risk score: {:.2}", risk);
    Ok(())
}

JetGraph Documentation

⚡ Quick Start in 5 Minutes

Start JetGraph with Docker Compose

Verify the engine is ready

Load sample data with one click (optional)

Or — register a schema and write your first node

Introduction

What is JetGraph?

Key Features

Sub-millisecond Queries

Cypher Query Language

Bolt Protocol

Rust Performance

Streaming Ingestion

Risk Propagation

When to Use a Graph Database

Running JetGraph

Prerequisites

Exposed Ports

Health Check

Connection — REST / Cypher API

Base URL

Request Format

Response Format

Example Requests

Additional Endpoints

Error Responses

Connection — Bolt Protocol

What is Bolt?

Connection Details

Python Example

JavaScript (Node.js) Example

Connection — Rust Client

Installation

Minimal Working Example

Recommended Usage Pattern

Data Modeling in JetGraph

Core Concepts

Graph Diagram — Fraud Detection Domain

Schema Registration

Edge Types, Histograms & Activity Windows

Registering a transaction edge with full numeric features

Reading node histograms

Reading edge state

Data Modeling Best Practices

Querying the Graph — Cypher

Basic Patterns

CREATE — Insert a Node

MATCH — Query Nodes

CREATE — Insert a Relationship

Traversal — Multi-hop Queries

Filtering with WHERE

Parameterized Queries

Aggregations

MERGE — Upsert Nodes and Relationships

OPTIONAL MATCH

WITH — Pipeline and Filter Mid-Query

UNWIND — Expand a List

Data Manipulation

Insert Data

Update Node Properties

Delete a Node

Delete a Relationship

Bulk Inserts

Graph Analysis & Use Cases

Velocity Counting (O(1))

Fraud Detection Pattern

Ring Fraud Detection (Cypher)

Recommendation System Pattern

Performance & Best Practices

Query Optimization Tips

Efficient Traversal Patterns

Common Mistakes to Avoid

Cypher Best Practices

1. Always label every node in MATCH

2. Always use parameters — never embed literal values

3. Filter on external_id equality on the source node

4. Always type every relationship

5. Canonical fraud context — the recommended multi-hop template

6. Anchor ring / co-occurrence queries on the known node

3. Filter on `external_id` equality on the source node

8. Use `db.nodeStats()` for counts, not `MATCH (n:TYPE)`

9. Batch writes with `graph.ingest()`

10. Record transaction events with `graph.upsertEdge()`

Schema Procedures (`db.*`)

Graph Procedures (`graph.*`)