Developer Documentation

JetGraph Documentation

Everything you need to connect, query, and build with JetGraph — the high-performance in-memory graph engine built in Rust.

⚡ Quick Start in 5 Minutes

Get JetGraph running locally and execute your first graph query — no build step, no configuration.

  1. 1

    Start JetGraph with Docker Compose

    Save the following as docker-compose.yml and run docker compose up -d.

docker-compose.yml
services: graphengine: image: alhascan/jetgraph-demo:latest restart: unless-stopped ports: - "8080:8080" - "7687:7687" volumes: - graphengine-data:/data environment: RUST_LOG: info ENABLE_ADMIN_RESET: "true" healthcheck: test: ["CMD", "sh", "-c", "curl -sf http://localhost:8080/health || curl -sf http://localhost:8080/api/health || exit 1"] interval: 10s timeout: 5s retries: 5 start_period: 120s segment-evaluator: image: alhascan/jetgraph-demo:latest entrypoint: ["/usr/local/bin/segment-evaluator"] restart: unless-stopped environment: RUST_LOG: info GRAPH_ENGINE_ENDPOINT: http://graphengine:50051 SEGMENT_EVALUATOR_CONFIG_DB: /data/seg-config volumes: - seg-config-data:/data/seg-config depends_on: graphengine: condition: service_healthy pattern-miner: image: alhascan/jetgraph-demo:latest entrypoint: ["/usr/local/bin/pattern-miner"] restart: unless-stopped environment: RUST_LOG: info GRAPH_ENGINE_ENDPOINT: http://graphengine:50051 PATTERN_MINER_CONFIG_DB: /data/pm-config PATTERN_MINER_ADDR: 0.0.0.0:8082 volumes: - pm-config-data:/data/pm-config depends_on: graphengine: condition: service_healthy graphengine-ui: image: alhascan/jetgraph-ui-demo:latest restart: unless-stopped ports: - "80:3000" environment: BOLT_URL: bolt://graphengine:7687 NEO4J_USER: "" NEO4J_PASSWORD: "" GRAPH_HTTP_URL: http://graphengine:8080 SEGMENT_API_URL: http://segment-evaluator:8081 PATTERN_MINER_URL: http://pattern-miner:8082 depends_on: graphengine: condition: service_healthy volumes: graphengine-data: seg-config-data: pm-config-data:
  1. 2

    Verify the engine is ready

    Wait about 5 seconds for the container to start, then check the health endpoint.

bash
curl http://localhost:8080/health # → {"status":"ok","ready":true}
  1. 3

    Load sample data with one click (optional)

    Open the Admin UI at http://localhost, click Schema in the left nav, then click ⚡ Apply Schema & Load Sample Data in the Quick Start — Credit Card Fraud Space card. This provisions the Credit Card Fraud schema and seeds a representative dataset so every query in Analytics returns results. When the green All done banner appears, explore the data in Graph Explorer, Cypher Editor, or the Analytics pages. Prefer to bring your own schema? Skip this step and register it manually in Step 4 below.

  1. 4

    Or — register a schema and write your first node

    Skip this step if you loaded the sample dataset in Step 3. Otherwise, schema must be declared once before any data can be written. Run these three calls in order.

bash — POST /cypher
# Step 1 — register a node type curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerNodeType(\"USER\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}' # Step 2 — finalize the schema (required before any writes) curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.finalizeSchema() YIELD schema_version RETURN schema_version","parameters":{}}' # Step 3 — create a node curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CREATE (u:USER {external_id: $id}) RETURN u.external_id AS created","parameters":{"id":"user-001"}}' # → {"columns":["created"],"rows":[["user-001"]]}
The Admin UI at http://localhost gives you a Cypher editor, schema designer, and graph explorer — great for exploration without writing code.

Introduction

What is JetGraph?

JetGraph is a purpose-built, in-memory graph engine designed for applications that need real-time graph queries and decisions at high throughput. It stores your graph entirely in memory for sub-millisecond access, supports the Cypher query language, speaks the Bolt wire protocol (compatible with all official Neo4j drivers), and exposes a simple HTTP/Cypher API for any language.

JetGraph is not a general-purpose persistent database — it is purpose-built for high-velocity workloads where you need graph signals in real time: fraud detection, recommendation engines, anomaly detection, network security, and more.

Key Features

Sub-millisecond Queries

Entirely in-memory; O(1) velocity lookups via pre-computed rings.

🔤

Cypher Query Language

The industry-standard graph query language — expressive and readable.

🔌

Bolt Protocol

Drop-in replacement for Neo4j drivers — no driver changes required.

🦀

Rust Performance

Built in Rust for predictable, low-latency performance under load.

📡

Streaming Ingestion

Ingest up to 35,000 events/sec from Kafka, webhooks, or direct API.

🔗

Risk Propagation

Automatic fraud contagion across the graph with DashMap-based O(1) reads.

When to Use a Graph Database

Graph databases shine when the relationships between entities are as important as the entities themselves. Use JetGraph when you need to:

Scenario Relational DB JetGraph
Simple row lookups by primary key Ideal Overhead
Multi-hop relationship traversal Expensive JOINs Native
Real-time velocity counting Aggregate queries O(1) pre-computed
Pattern detection across a network Very complex Cypher traversal
Durable, large-scale persistence Ideal Use upstream store

Running JetGraph

Prerequisites

Exposed Ports

PortProtocolPurpose
8080HTTPCypher REST API (POST /cypher), health, metrics
7687TCP / BoltBolt binary protocol — Neo4j driver compatible
50051TCP / gRPCHigh-throughput Rust client (jetgraph-client crate), streaming ingestion
80HTTPAdmin UI (graphengine-ui container)

Health Check

bash
curl http://localhost:8080/health # {"status":"ok","ready":true}
ℹ️
The engine starts with an empty in-memory graph. Schema must be registered and finalized before any data can be written. The schema persists to the mounted volume so it survives container restarts.

Connection — REST / Cypher API

The simplest way to interact with JetGraph from any language. Send a JSON body with a query (Cypher string) and optional parameters to POST /cypher.

Base URL

http://localhost:8080

Request Format

FieldTypeDescription
querystringA Cypher query string
parametersobjectNamed parameters referenced as $name in the query

Response Format

Successful responses return a JSON object with columns and rows:

{"columns": ["id", "label"], "rows": [["user-001", "USER"]]}

Example Requests

bash — create a node
curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{ "query": "CREATE (u:USER {external_id: $id, email: $email}) RETURN u.external_id AS id", "parameters": {"id": "user-001", "email": "alice@example.com"} }' # → {"columns":["id"],"rows":[["user-001"]]}
bash — match nodes
curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{ "query": "MATCH (u:USER) RETURN u.external_id AS id LIMIT 10", "parameters": {} }'
bash — create a relationship
curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{ "query": "MATCH (u:USER {external_id: $uid}), (m:MERCHANT {external_id: $mid}) CREATE (u)-[:TRANSACTS_AT]->(m) RETURN true AS created", "parameters": {"uid": "user-001", "mid": "merchant-42"} }'

Additional Endpoints

MethodPathDescription
GET/healthReturns JSON with ready=true; 503 while loading snapshot
GET/metricsPrometheus text metrics — ingest rate, query counters, memory pressure, RCU retries

Error Responses

On error, JetGraph returns a non-2xx HTTP status with a JSON error body:

{"error": "Schema not finalized. Call db.finalizeSchema() before writing data."}
HTTP StatusMeaning
200 OKQuery executed successfully
400 Bad RequestMalformed query or invalid parameters
409 ConflictSchema conflict or duplicate node type
500 Internal Server ErrorUnexpected engine error — check logs
503 Service UnavailableEngine still loading snapshot at startup — retry after a moment

Connection — Bolt Protocol

JetGraph speaks the Bolt binary protocol on port 7687, the same protocol used by Neo4j. This means any official or community Neo4j driver works out-of-the-box — no code changes, no new SDK to learn.

What is Bolt?

Bolt is a binary, connection-oriented protocol optimized for graph databases. It supports efficient serialisation of Cypher queries and results, pipelining, and authentication. Because JetGraph is Bolt-compatible, you can use drivers for Python, JavaScript, Java, Go, .NET, and more without modification.

Connection Details

ParameterValue (demo mode)
URLbolt://localhost:7687
Username"" (empty)
Password"" (empty)

Python Example

python
from neo4j import GraphDatabase driver = GraphDatabase.driver("bolt://localhost:7687", auth=("", "")) with driver.session() as session: # Create a node session.run( "CREATE (u:USER {external_id: $id})", id="user-bolt-001" ) # Query nodes result = session.run("MATCH (u:USER) RETURN u.external_id AS id LIMIT 5") for record in result: print(record["id"]) driver.close()

JavaScript (Node.js) Example

javascript
const neo4j = require('neo4j-driver'); const driver = neo4j.driver( 'bolt://localhost:7687', neo4j.auth.basic('', '') ); const session = driver.session(); const result = await session.run( 'MATCH (u:USER) RETURN u.external_id AS id LIMIT 10' ); result.records.forEach(r => console.log(r.get('id'))); await session.close(); await driver.close();

Connection — Rust Client

The jetgraph-client crate provides a typed, ergonomic API over gRPC. It is the recommended client for Rust applications that need the highest throughput and the lowest latency.

🦀
The Rust client is open source — source, issues, and release notes live at github.com/JetGraphEngine/JetGraphClient. The full JetGraphEngine organization hosts all official repositories.

Installation

Add the crate to your Cargo.toml:

toml — Cargo.toml
[dependencies] jetgraph-client = "*" tokio = { version = "1", features = ["full"] }

Minimal Working Example

rust
use jetgraph_client::{GraphClient, CreateEdgeRequest, VelocityQuery, prop}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { // Connect to the engine let graph = GraphClient::connect("http://localhost:50051").await?; // Lookup an existing node by external ID let card_id = graph.lookup_node("CARD", "card-001").await?; // Query velocity: how many TRANSACTS_AT edges in the last hour? let count = graph .get_velocity_count(VelocityQuery { node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 3600, }) .await? .count; println!("Transactions in last hour: {}", count); Ok(()) }

Recommended Usage Pattern

For scoring workloads, follow the Query → Score → Insert three-phase pattern on every event:

💡
Always write the edge even on a declined event. The graph needs the full history to compute accurate velocity counts and risk propagation going forward.

Data Modeling in JetGraph

Core Concepts

JetGraph organizes data into two primitives:

Each node and edge has a type (also called a label) and an optional set of properties (key-value pairs).

Graph Diagram — Fraud Detection Domain

:CARD external_id TRANSACTS_AT :MERCHANT external_id, mcc USES_DEVICE :DEVICE fingerprint USES_IP :IP_ADDRESS address, country OWNED_BY :ACCOUNT external_id, risk

Schema Registration

Before any data can be written, you must declare your node types and edge types. This is done once at startup via the CALL db.* system procedures, then finalized with db.finalizeSchema().

bash — full schema setup
# Register node types (second arg is the ID type: "string" or "integer") curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerNodeType(\"CARD\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}' curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerNodeType(\"MERCHANT\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}' curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerNodeType(\"DEVICE\", \"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}' # Register an edge type curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerEdgeType({name:\"TRANSACTS_AT\",from_node_type:\"CARD\",to_node_type:\"MERCHANT\"}) YIELD edge_type_id RETURN edge_type_id","parameters":{}}' curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerEdgeType({name:\"USES_DEVICE\",from_node_type:\"CARD\",to_node_type:\"DEVICE\"}) YIELD edge_type_id RETURN edge_type_id","parameters":{}}' # Finalize — must be called after all types are registered curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.finalizeSchema() YIELD schema_version RETURN schema_version","parameters":{}}'
⚠️
Schema is immutable after finalization. Plan your node types and edge types carefully. In development you can wipe everything with CALL db.resetGraph() when ENABLE_ADMIN_RESET=true is set, or restart the container with a fresh volume.

Edge Types, Histograms & Activity Windows

Edge types define both the graph relationship (CARD → MERCHANT) and the feature storage kept for every edge pair. The storage layout is chosen once, when the edge type is registered. For ML/GNN use cases, the two most important read paths are: graph.histogram for node-level bucketed counts and graph.edgeState for edge-pair state.

ConceptScopeWhat it storesTypical use
Compact edge payload One (src, dst, edge_type) pair tx_count, approx_sum, last_seen, activity bitmap, optional 8-bin amount histogram, optional bool flag Edge features such as count, amount sum, recency, velocity.
Activity bitmap One edge pair 21 recent time ticks, 3 bits each. Each tick count saturates at 7. Fast edge-level velocity windows: last 5 min, 10 min, 1 hour, etc.
Node histogram One (node, edge_type) side Two ring buffers: hourly slots and daily slots. Each slot has 8 amount/value buckets. Node-level behaviour: amount distribution for a card over last 1h, 24h, 7d.

Registering a transaction edge with full numeric features

To get the full compact payload (numeric bins + approximate sum), register the edge type with bin_boundaries. The seven boundaries define eight buckets. tracked_property names the numeric value from ingest that is binned and summed, usually "amount" for payments.

cypher — PAYMENT edge with amount bins, ticks, and node histograms
CALL db.registerEdgeType({ name: "PAYMENT", from_node_type: "CARD", to_node_type: "MERCHANT", state_ttl_secs: 7776000, // Full CompactEdgePayload: 7 thresholds → 8 amount buckets bin_boundaries: [5, 25, 50, 100, 250, 500, 1000], tracked_property: "amount", // Edge-level velocity bitmap: 21 ticks × 5 minutes = 105 minutes max lookback activity_bitmap: { tick_size_secs: 300 }, // Node-level rolling histograms. Each slot stores 8 bucket counts. node_histogram: { enabled_for_src: true, enabled_for_dst: false, hourly_slots: 24, daily_slots: 7 } }) YIELD edge_type_id RETURN edge_type_id
Registration fieldMeaning
bin_boundariesSeven numeric thresholds. They create eight buckets: <5, 5–25, 25–50, …, ≥1000.
tracked_propertyThe numeric input field that feeds approx_sum and the bucket counters. For payments this is usually amount.
activity_bitmap.tick_size_secsThe duration of one edge-level activity tick. With 300, [1, 2, 12] means last 5 min, 10 min, and 1 hour.
node_histogram.hourly_slotsHow many hourly histogram slots to keep. 24 keeps 24 hours of hourly detail.
node_histogram.daily_slotsHow many daily histogram slots to keep. 7 keeps 7 days of daily detail.

Reading node histograms

graph.histogram returns aggregated bucket counts for one node and one edge type. It is node-level: for CARD → PAYMENT, it counts all PAYMENT edges from that card, not a single merchant edge.

cypher — last 24 hours and last 7 days
MATCH (c:card {external_id: "card-velocity-09"}) // Use the hourly ring and sum the most recent 24 hourly slots. CALL graph.histogram(c, "PAYMENT", 24) YIELD buckets, counts AS hourly // Use the daily ring and sum the most recent 7 daily slots. CALL graph.histogram(c, "PAYMENT", null, 7) YIELD counts AS days RETURN c.node_id, buckets, hourly, days

Example output counts = [0, 2, 12, 2, 0, 0, 0, 0] means: 0 events below 5, 2 events in 5–25, 12 events in 25–50, 2 events in 50–100, and none in the higher buckets.

Reading edge state

graph.edgeState reads one edge pair. It is the edge-level complement to graph.histogram. Use it for per-edge features such as tx_count, approx_sum, last_seen, boolean flags, and activity windows.

cypher — specific card → merchant edge state
MATCH (c:card {external_id: "card-velocity-09"}) CALL graph.edgeState( c, "merchant:merchant-uk-10", "PAYMENT", [1, 2, 12] ) YIELD tx_count, approx_sum, last_seen, bool_flag, activity_counts RETURN c.node_id, tx_count, approx_sum, last_seen, bool_flag, activity_counts

If PAYMENT.activity_bitmap.tick_size_secs = 300, then [1, 2, 12] asks for counts over the last 5 minutes, 10 minutes, and 1 hour. The bitmap holds at most 21 ticks, so a 5-minute tick gives about 105 minutes of edge-level activity history. Longer windows should come from node histograms.

💡
For embeddings: combine graph.histogram for node features (count distributions over 1h/24h/7d) with graph.edgeState for edge features (pair count, amount sum, recency, and short velocity windows).

Data Modeling Best Practices

Querying the Graph — Cypher

JetGraph uses Cypher, the standard graph query language originally developed for Neo4j and now governed by the openCypher specification. If you know SQL, Cypher will feel natural — it uses a similar declarative style but describes patterns in the graph rather than joins between tables.

Basic Patterns

Cypher uses ASCII-art notation to express graph patterns:

CREATE — Insert a Node

cypher
// Create a CARD node with properties CREATE (c:CARD {external_id: "card-001", country: "US"}) RETURN c.external_id AS id

MATCH — Query Nodes

cypher
// Find a specific card MATCH (c:CARD {external_id: "card-001"}) RETURN c.external_id AS id, c.country AS country // Find all cards (with limit — always paginate large result sets) MATCH (c:CARD) RETURN c.external_id AS id LIMIT 100

CREATE — Insert a Relationship

cypher
// Connect a CARD to a MERCHANT MATCH (c:CARD {external_id: "card-001"}), (m:MERCHANT {external_id: "merchant-42"}) CREATE (c)-[:TRANSACTS_AT {amount: 49.99, ts: 1712345678}]->(m) RETURN true AS created

Traversal — Multi-hop Queries

cypher
// Find all merchants this card has visited MATCH (c:CARD {external_id: "card-001"})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant // Two-hop: other cards that share a device with this card MATCH (c:CARD {external_id: "card-001"})-[:USES_DEVICE]->(d:DEVICE) <-[:USES_DEVICE]-(other:CARD) WHERE other.external_id <> "card-001" RETURN other.external_id AS related_card, d.fingerprint AS shared_device

Filtering with WHERE

cypher
MATCH (c:CARD)-[r:TRANSACTS_AT]->(m:MERCHANT) WHERE r.amount > 500 AND m.country = "US" RETURN c.external_id AS card, m.external_id AS merchant, r.amount ORDER BY r.amount DESC LIMIT 20

Parameterized Queries

Always use parameters (prefixed with $) instead of string interpolation to avoid injection and improve query plan reuse:

bash
curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{ "query": "MATCH (c:CARD {external_id: $card_id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant", "parameters": {"card_id": "card-001"} }'

Aggregations

cypher
// Count transactions per merchant MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant, COUNT(c) AS card_count ORDER BY card_count DESC LIMIT 10 // Sum transaction amounts per card MATCH (c:CARD)-[r:TRANSACTS_AT]->(m:MERCHANT) RETURN c.external_id AS card, SUM(r.amount) AS total_spend

MERGE — Upsert Nodes and Relationships

MERGE matches an existing pattern or creates it if it does not exist. Use ON CREATE SET and ON MATCH SET to set properties conditionally:

cypher
// Upsert a node — create if absent, update timestamp if it exists MERGE (c:CARD {external_id: $card_id}) ON CREATE SET c.created_at = $ts, c.country = $country ON MATCH SET c.last_seen = $ts RETURN c.external_id, c.created_at // Upsert a relationship between two existing nodes MATCH (c:CARD {external_id: $card_id}), (d:DEVICE {external_id: $device_id}) MERGE (c)-[:USES_DEVICE]->(d) RETURN true AS linked

OPTIONAL MATCH

OPTIONAL MATCH works like a left outer join — if the pattern does not exist, the variables are bound to null rather than excluding the row:

cypher
// Return card with its device fingerprint, even if no device is linked MATCH (c:CARD {external_id: $card_id}) OPTIONAL MATCH (c)-[:USES_DEVICE]->(d:DEVICE) RETURN c.external_id AS card, d.external_id AS device

WITH — Pipeline and Filter Mid-Query

WITH passes results from one query stage to the next, allowing intermediate filtering, aggregation, and variable re-binding:

cypher
// Find cards with more than 5 distinct merchants, then get their devices MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WITH c, COUNT(DISTINCT m) AS merchant_count WHERE merchant_count > 5 MATCH (c)-[:USES_DEVICE]->(d:DEVICE) RETURN c.external_id AS card, merchant_count, d.external_id AS device LIMIT 50

UNWIND — Expand a List

UNWIND turns a list into individual rows, which is useful for batch operations driven by a parameter array:

cypher
// Create multiple nodes from a list parameter in one round-trip UNWIND $card_ids AS cid CREATE (c:CARD {external_id: cid}) RETURN c.external_id AS created // Parameters: {"card_ids": ["card-001", "card-002", "card-003"]}

Data Manipulation

Insert Data

Use CREATE to insert new nodes and relationships. Both the source and destination nodes must already exist before creating a relationship.

bash — create node
curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CREATE (m:MERCHANT {external_id: $id, mcc: $mcc}) RETURN m.external_id","parameters":{"id":"merchant-42","mcc":"5411"}}'

Update Node Properties

Use SET to update or add properties on an existing node:

cypher
MATCH (c:CARD {external_id: "card-001"}) SET c.risk_score = 0.85, c.flagged = true RETURN c.external_id, c.risk_score

Delete a Node

A node must have no relationships before it can be deleted. Use DETACH DELETE to remove both the node and all its relationships in one step:

cypher
// Delete node and all its relationships MATCH (c:CARD {external_id: "card-001"}) DETACH DELETE c

Delete a Relationship

cypher
MATCH (c:CARD {external_id: "card-001"})-[r:TRANSACTS_AT]->(m:MERCHANT) DELETE r

Bulk Inserts

For bulk data loading, fire multiple POST /cypher requests in parallel. Each request is independent and thread-safe. For maximum throughput from Rust, use the jetgraph-client crate which batches requests over a persistent gRPC connection.

ℹ️
JetGraph can ingest up to 35,000 events per second via the streaming ingestion pipeline. For high-volume onboarding, use the bulk import tooling rather than individual /cypher POST requests.

Graph Analysis & Use Cases

Velocity Counting (O(1))

JetGraph pre-computes velocity counts using ring buffers, making time-window queries instant. Query them via the Rust client or via Cypher system procedures:

rust — velocity query
// Count TRANSACTS_AT edges from this card in the last 1 hour let count = graph .get_velocity_count(VelocityQuery { node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 3600, // 1 hour }) .await? .count; // 24-hour window let daily_count = graph .get_velocity_count(VelocityQuery { node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 86400, }) .await? .count;

Fraud Detection Pattern

Graph databases are uniquely effective for fraud detection because fraud rings are defined by connections. A card that shares a device with a flagged card is suspicious — even if the card itself has no prior fraud history.

rust — fraud scoring (full example)
async fn score_transaction( graph: &GraphClient, tx: &Transaction, ) -> Result<Decision> { // Phase 1: collect signals let card_id = graph.lookup_node("CARD", &tx.card_id).await?; let merchant_id = graph.lookup_node("MERCHANT", &tx.merchant_id).await?; // Velocity: transactions in last 1h and 24h let txn_1h = graph.get_velocity_count(VelocityQuery { node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 3600, }).await?.count; // Novelty: is this merchant new for this card? let is_new_merchant = !graph .edge_exists(card_id, merchant_id, "TRANSACTS_AT").await?; // Contagion: max fraud score from 1-hop neighbours let ctx = graph.get_fraud_context(FraudContextQuery { node: card_id, }).await?; // Phase 2: score let mut risk: f32 = 0.0; if is_new_merchant { risk += 0.15; } if txn_1h > 30 { risk += 0.25; } risk += 0.5 * ctx.max_neighbor_fraud_score; let decision = match risk { r if r > 0.7 => Decision::Decline, r if r > 0.4 => Decision::Challenge, _ => Decision::Approve, }; // Phase 3: insert edge (always, even on decline) graph.create_edge(CreateEdgeRequest { edge_type_name: "TRANSACTS_AT".into(), src: card_id, dst: merchant_id, properties: vec![prop("amount", tx.amount), prop("decision", &decision)], }).await?; // Propagate fraud score if declined if decision == Decision::Decline { graph.flag_node(FlagRequest { node: card_id, fraud_score: 0.85, reason: "auto_decline".into(), }).await?; } Ok(decision) }

Ring Fraud Detection (Cypher)

Find cards that share a device with a known-fraudulent card — the classic "fraud ring" pattern:

cypher
// Cards that share a device with card-001 (1-hop via DEVICE) MATCH (seed:CARD {external_id: "card-001"}) -[:USES_DEVICE]->(d:DEVICE) <-[:USES_DEVICE]-(suspect:CARD) WHERE suspect.external_id <> "card-001" RETURN suspect.external_id AS card, d.fingerprint AS shared_device

Recommendation System Pattern

Find merchants popular with other cards that share the same device as the current card — a graph-based collaborative filter:

cypher
MATCH (c:CARD {external_id: "card-001"}) -[:USES_DEVICE]->(d:DEVICE) <-[:USES_DEVICE]-(peer:CARD) -[:TRANSACTS_AT]->(m:MERCHANT) WHERE NOT (c)-[:TRANSACTS_AT]->(m) RETURN m.external_id AS recommended_merchant, COUNT(peer) AS peer_count ORDER BY peer_count DESC LIMIT 5

Performance & Best Practices

Query Optimization Tips

Efficient Traversal Patterns

PatternRecommendedAvoid
Count events in time window Velocity API (get_velocity_count) COUNT with filter over edges
Check if relationship exists edge_exists(src, dst, type) Full MATCH + COUNT
Find connected neighbours 1–2 hop MATCH with LIMIT Unbounded variable-length paths
Get risk context get_fraud_context(node_id) Manually traversing and aggregating

Common Mistakes to Avoid

Cypher Best Practices

These rules are derived from direct analysis of the JetGraph query planner logs. Each one maps to a specific engine optimisation — or the absence of one — that has measurable impact at scale. Follow them to keep every query O(1) or close to it.

1. Always label every node in MATCH

The pushdown optimizer converts a WHERE node.external_id = $value filter into an O(1) IndexLookup only when it knows the node type. Without a label the engine produces NodeScan { node_type: None } — a full scan across every node in the graph, repeated once per edge type registered in the schema.

cypher — ✗ Avoid
// No label on (s) → full graph scan × number of edge types MATCH (s)-[r]->(d) WHERE s.external_id = $id RETURN s, r, d
cypher — ✓ Prefer
// Typed nodes → pushdown rewrites NodeScan to O(1) IndexLookup MATCH (s:CARD)-[r:TRANSACTS_AT]->(d:MERCHANT) WHERE s.external_id = $id RETURN s.external_id, d.external_id

2. Always use parameters — never embed literal values

Literal values baked into the query string each create a unique plan cache key. Every new literal triggers a full parse + plan cycle regardless of how many times a similar query has been run before. Parameters collapse every variation of a query into a single cached plan that is reused for every card ID, merchant ID, device fingerprint, or IP address.

cypher — ✗ Avoid
// Each card number is a separate cache key — replanning on every call MATCH (c:CARD {external_id: "9792487647826207"})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant
cypher — ✓ Prefer
// One plan, cached forever — value supplied at runtime via parameters MATCH (c:CARD {external_id: $card_id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant
💡
This applies equally to device fingerprints, IP addresses, customer IDs, and every other lookup value. Any hardcoded string in the query string is a separate cache entry.

3. Filter on external_id equality on the source node

The pushdown optimizer walks the plan tree and converts Filter(src.external_id = value) → NodeScan into an IndexLookup. This rewrite only fires when the filter is an equality on external_id and applies to the source variable of the expand. Filters on destination properties or on any other property remain as post-expansion filters (slower).

cypher — ✗ Avoid
// No anchor on the source — engine scans all CARDs then filters merchants MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE m.name CONTAINS 'Airline' RETURN c.external_id, m.name
cypher — ✓ Prefer
// Anchor on source external_id → pushdown fires → IndexLookup for the card MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE c.external_id = $card_id AND m.name CONTAINS 'Airline' RETURN c.external_id, m.name // If you need to start from the merchant side, flip the traversal direction MATCH (m:MERCHANT)<-[:TRANSACTS_AT]-(c:CARD) WHERE m.external_id = $merchant_id RETURN c.external_id

4. Always type every relationship

An untyped relationship -[r]-> causes the engine to fan out the expand across every edge type registered in the schema — one full expand per type. With six edge types registered, a single untyped MATCH compiles into six separate query plans. Always name the relationship type explicitly.

cypher — ✗ Avoid
// Untyped [r] → engine fans out over EdgeTypeId(0), (1), (2) … for every type MATCH (c:CARD)-[r]->(n) WHERE c.external_id = $card_id RETURN type(r), n.external_id
cypher — ✓ Prefer
// Single explicit edge type → one targeted expand, one IndexLookup MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE c.external_id = $card_id RETURN m.external_id // Need multiple edge types? Use UNION ALL — each branch is individually optimized MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE c.external_id = $card_id RETURN 'merchant' AS kind, m.external_id AS neighbor UNION ALL MATCH (c:CARD)-[:USES_DEVICE]->(d:DEVICE) WHERE c.external_id = $card_id RETURN 'device' AS kind, d.external_id AS neighbor UNION ALL MATCH (c:CARD)-[:USES_IP]->(ip:IP) WHERE c.external_id = $card_id RETURN 'ip' AS kind, ip.external_id AS neighbor

5. Canonical fraud context — the recommended multi-hop template

This is the recommended pattern for pulling the full risk context of a card in a single round-trip: typed nodes, typed relationships, parameterized, one plan cached for every card. Each UNION ALL branch is compiled and optimized independently.

cypher — ✓ Canonical fraud context query
MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE c.external_id = $card_id RETURN 'merchant' AS kind, m.external_id AS id, m.name AS label UNION ALL MATCH (c:CARD)-[:USES_DEVICE]->(d:DEVICE) WHERE c.external_id = $card_id RETURN 'device' AS kind, d.external_id AS id, '' AS label UNION ALL MATCH (c:CARD)-[:USES_IP]->(ip:IP) WHERE c.external_id = $card_id RETURN 'ip' AS kind, ip.external_id AS id, '' AS label

6. Anchor ring / co-occurrence queries on the known node

Shared-entity ring queries (cards sharing a device or IP) must start from a known, typed anchor. An unanchored ring causes a full scan of all devices before expanding to cards. Start from the entity you know.

cypher — ✗ Avoid
// No anchor → scans all DEVICE nodes, expands to all cards, then filters MATCH (d:DEVICE)<-[:USES_DEVICE]-(c:CARD) WHERE c.country = 'US' RETURN d.external_id, c.external_id
cypher — ✓ Prefer
// Start from a known card → IndexLookup → expand to its devices → expand to sibling cards MATCH (c1:CARD)-[:USES_DEVICE]->(d:DEVICE)<-[:USES_DEVICE]-(c2:CARD) WHERE c1.external_id = $card_id AND c2.external_id <> $card_id RETURN DISTINCT c2.external_id AS linked_card, d.external_id AS shared_device LIMIT 50 // Or start from a known device MATCH (d:DEVICE)<-[:USES_DEVICE]-(c:CARD) WHERE d.external_id = $device_id RETURN c.external_id AS linked_card LIMIT 50

7. Always add LIMIT to traversals and scans

The engine propagates LIMIT hints down into NodeScan and VariableExpand nodes so BFS stops as soon as enough rows are found. Without a limit, traversals materialise the full result set before returning anything. This is especially important for variable-length paths.

cypher — ✓ Prefer
// LIMIT is pushed into the expand — BFS stops after finding 50 merchants MATCH (c:CARD)-[:TRANSACTS_AT]->(m:MERCHANT) WHERE c.external_id = $card_id RETURN m.external_id, m.name LIMIT 50 // Variable-length traversal: cap both hop count and result rows MATCH (c:CARD)-[:TRANSACTS_AT*1..3]->(m:MERCHANT) WHERE c.external_id = $card_id RETURN DISTINCT m.external_id LIMIT 100

8. Use db.nodeStats() for counts, not MATCH (n:TYPE)

A bare MATCH (n:CARD) RETURN count(n) materialises every node in that type before counting. db.nodeStats() reads a pre-maintained O(1) counter with no scan.

cypher — ✗ Avoid
// Materialises all nodes in the type before counting MATCH (n:CARD) RETURN count(n) AS total
cypher — ✓ Prefer
// O(1) pre-computed counter — no scan CALL db.nodeStats() YIELD type, count WHERE type = 'CARD' RETURN count // If you must scan for exploration, always cap with LIMIT MATCH (n:CARD) RETURN n.external_id LIMIT 25

9. Batch writes with graph.ingest()

Individual CREATE statements are planned and executed one at a time. For loading multiple nodes and edges in a single operation, use graph.ingest() — one round-trip, one plan, one atomic batch write regardless of how many entities are included.

cypher — ✓ Prefer for batch loads
CALL graph.ingest( [ { node_type: 'CARD', externalId: $card_id }, { node_type: 'MERCHANT', externalId: $merchant_id } ], [ { edge_type: 'TRANSACTS_AT', src: $card_id, dst: $merchant_id } ] ) YIELD ok, nodes_created, edges_created RETURN ok, nodes_created, edges_created

10. Record transaction events with graph.upsertEdge()

For high-throughput edge upserts — recording payment events — use graph.upsertEdge() rather than MERGE-based patterns. It is a single O(1) procedure call that increments counters and updates the activity bitmap atomically, with no planner involved.

cypher — ✓ Prefer for event recording
CALL graph.upsertEdge( $edge_type, $src_typed_id, $dst_typed_id ) YIELD created_new, tx_count, approx_sum RETURN created_new, tx_count, approx_sum // Parameters example: // { "edge_type": "TRANSACTS_AT", // "src_typed_id": "CARD:9792487647826207", // "dst_typed_id": "MERCHANT:M2_0000803" }

Quick Reference

#RuleEngine impact avoided
1 Label every node: (c:CARD) not (c) Eliminates full-graph NodeScan { node_type: None }
2 Use $parameters, never inline literals One cached plan per query shape instead of one per value
3 Filter src.external_id = $value on the source node Triggers pushdown → O(1) IndexLookup
4 Type every relationship: [:TRANSACTS_AT] not [r] Prevents N-way fan-out (one expand per edge type)
5 Use the UNION ALL fraud context template One cached plan, all neighbours, one round-trip
6 Anchor ring queries on the known node Avoids full DEVICE / IP scan in co-occurrence queries
7 Always add LIMIT to traversals BFS stops early; limit is pushed into the expand node
8 Use db.nodeStats() for counts O(1) counter vs full node materialisation
9 Batch writes with graph.ingest() Single round-trip; bypasses per-statement planning
10 Edge events with graph.upsertEdge() O(1) atomic counter update; no planner involved

Error Handling & Debugging

Common Errors

ErrorCauseFix
Schema not finalized Writing data before db.finalizeSchema() Call CALL db.finalizeSchema() after all type registrations
Unknown node type Using a label not registered in the schema Register the type with db.registerNodeType() before finalizing
Unknown edge type Using a relationship type not registered Register with db.registerEdgeType() before finalizing
Node not found Creating an edge to a node that doesn't exist Always CREATE the destination node before creating an edge to it
Duplicate node type Registering the same type twice Each type name must be unique — restart with a fresh volume if needed in dev
Connection refused on port 8080 Container not started or health check failing Check docker compose logs graphengine for startup errors

Debugging with Logs

bash
# Follow live logs from the graph engine docker compose logs -f graphengine # Increase verbosity (set RUST_LOG=debug in docker-compose.yml) # Valid values: error | warn | info | debug | trace

Testing Connectivity

bash
# Health endpoint curl -v http://localhost:8080/health # Verify Bolt port is open nc -zv localhost 7687 # Simple Cypher round-trip curl -sS -X POST http://localhost:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"RETURN 1 AS ping","parameters":{}}' # → {"columns":["ping"],"rows":[[1]]}

Persistence & Durability

JetGraph is an in-memory engine — all data lives in RAM for sub-millisecond access. Durability is provided by a layered persistence model that writes to disk asynchronously without pausing query processing.

How Data Survives Restarts

📸

Full Snapshots

Complete graph serialized to disk nightly (default 02:30 AM) or on a configurable interval. Compressed with zstd.

Delta Files

Incremental changes written every 30–60 seconds. Applied on top of the base snapshot at startup for fast recovery.

🔗

Checkpoints

When the delta chain grows long, deltas are merged offline into a single checkpoint file — no engine pause required.

🛡️

Shutdown Delta

On graceful stop, a final delta is written before the full snapshot — so recent changes are safe even if the snapshot is interrupted.

Recovery Sequence at Startup

  1. 1

    Load latest full snapshot

    The most recent snapshot-*.bin file is deserialized into memory. The engine is marked ready immediately so queries can start while the Cuckoo filter rebuilds in the background.

  2. 2

    Apply checkpoint (if any)

    If a checkpoint file exists for this snapshot base, it is applied first to skip replaying the earliest deltas.

  3. 3

    Replay remaining delta files

    Any delta files newer than the checkpoint are applied in order, bringing the graph fully up to date.

ℹ️
The engine returns 503 from /health while loading a snapshot. Polls will succeed once loading completes — typically within seconds for small graphs, longer for very large ones. This is why start_period in the health check should be set generously.

Emergency Snapshot (Memory Pressure)

When RSS memory exceeds memory_limit_bytes in config.toml, the engine automatically writes an emergency full snapshot, blocks new ingest, and logs an error. This protects data before the container OOM-killer fires. Ingest resumes once memory drops back below the limit.

Manual Snapshot

Trigger a snapshot on demand via Cypher without restarting:

cypher
CALL graph.saveSnapshot() YIELD ok, path RETURN ok, path

Graceful Shutdown

Send SIGTERM (what docker compose stop / docker compose down sends) and the engine will write a final delta then a full snapshot before exiting. The stop_grace_period: 300s in the compose file gives it up to 5 minutes for very large graphs. Never use SIGKILL directly — it bypasses the shutdown snapshot.

Clustering & High Availability

JetGraph supports primary–standby clustering for high availability and read scale-out. The primary serves reads and writes; one or more standbys keep a live in-memory replica of the graph and serve reads only. Replication is delta-based and lag is typically under 35 seconds under normal ingest.

Architecture

A cluster is built from three small components running alongside each graphengine process:

🖥️

graphengine (primary)

Accepts ingest. Writes a full snapshot nightly and a delta file every 30 seconds.

📤

delta-replicator

Sidecar on the primary. Ships each new snapshot and delta to the standby via rsync over SSH within a few seconds.

🧩

delta-compactor

Sidecar on both nodes. Merges long chains of raw deltas into checkpoints so restart replay stays fast.

📥

graphengine (standby)

Runs with STANDBY_MODE=true. Polls for new delta files every 5 s and applies them to the live graph. Hot-reloads on new nightly snapshots without restarting.

┌────────────── PRIMARY ──────────────┐ ┌────────────── STANDBY ──────────────┐ │ graphengine (read + write) │ │ graphengine STANDBY_MODE=true │ │ writes snapshot-*.bin delta-*.bin│ │ applies deltas, hot-reloads │ │ delta-replicator ───── rsync/ssh ────────────▶ /opt/graphengine/data/snapshots │ │ delta-compactor │ │ delta-compactor │ └─────────────────────────────────────┘ └─────────────────────────────────────┘ ≤ 30 s produce + ≈ 5 s ship = ≤ 35 s replication lag

What Gets Replicated

Segment Evaluator and Pattern Miner sidecars maintain their own config stores; they should run on both hosts but do not participate in graph replication.

Replication Lag

Lag = delta_interval_secs (30 s default) + POLL_INTERVAL_SECS (5 s default) = ~35 seconds upper bound under normal load. For tighter RPO, lower both intervals at the cost of more but smaller delta files. Data loss on an unplanned primary failure is bounded by this window.

Minimum Compose Setup

Clusters use two Compose files. The primary runs the engine plus the replicator sidecar; the standby runs the engine with STANDBY_MODE enabled.

yaml — primary (key services)
services: graphengine: image: ghcr.io/fraudmanagement/graphengine:main environment: ENABLE_ADMIN_RESET: "true" CONFIG_PATH: /config/config.toml MALLOC_CONF: "background_thread:true,dirty_decay_ms:1000,muzzy_decay_ms:10000,narenas:16" volumes: - /opt/graphengine/data:/data - ./config.toml:/config/config.toml:ro delta-replicator: image: ghcr.io/fraudmanagement/graphengine:main command: ["/usr/local/bin/delta-replicator"] environment: SNAPSHOT_DIR: /data/snapshots REPLICA_REMOTE: "${STANDBY_USER}@${STANDBY_HOST}:/opt/graphengine/data/snapshots/" REPLICA_SSH_KEY: /keys/replication.key volumes: - /opt/graphengine/data:/data - ./replication.key:/keys/replication.key:ro
yaml — standby (key service)
services: graphengine: image: ghcr.io/fraudmanagement/graphengine:main environment: STANDBY_MODE: "true" POLL_INTERVAL_SECS: "5" CONFIG_PATH: /config/config.toml volumes: - /opt/graphengine/data:/data - ./config.toml:/config/config.toml:ro
ℹ️
Port 22 (SSH) on the standby host must be reachable from the primary host — delta-replicator rsyncs over SSH directly, not through the Docker network.

Failover

  1. 1

    Drain and stop the primary

    Stop new writes at the load-balancer level, then docker compose stop graphengine. On SIGTERM the engine writes a final delta and full snapshot — up to stop_grace_period (default 300 s).

  2. 2

    Wait for the standby to drain the delta queue

    Tail the standby logs until no more applying delta lines appear. At that point the two engines are byte-for-byte equivalent.

  3. 3

    Promote the standby

    Restart the engine with STANDBY_MODE removed (or swap in the primary compose file on that host). The data directory is already a full mirror — no migration step.

  4. 4

    Cut clients over

    Update DNS or load-balancer config. Optionally reconfigure the old primary as the new standby — point its STANDBY_HOST at the new primary and install the new replication key.

For an unplanned failover (primary host lost), promote the standby immediately. Data loss is bounded by the last delta the standby applied — at most ≈ 35 seconds.

Verifying Replication

bash
# Compare node counts on both endpoints — should match or differ by one delta curl -sS -X POST http://PRIMARY:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.nodeStats() YIELD type, count RETURN type, count"}' curl -sS -X POST http://STANDBY:8080/cypher \ -H 'Content-Type: application/json' \ -d '{"query":"CALL db.nodeStats() YIELD type, count RETURN type, count"}' # Watch the replicator ship files and the standby apply them docker logs -f delta-replicator | grep shipping docker logs -f graphengine | grep "applying delta"

Operational Limits

📖
For step-by-step production setup — SSH key generation, full memory sizing tables up to 300 GiB, and every troubleshooting playbook — see the Clustering Administrator Guide shipped with the repository at docs/clustering-admin-guide.md.

Memory & Compression

JetGraph keeps the entire graph in RAM to guarantee sub-millisecond reads, so every byte is engineered. The engine uses several compression and compaction techniques — trading a few cycles for a much smaller footprint — so a single 96 GiB host can hold hundreds of millions of nodes and billions of edges.

Where Memory Goes

Use db.memoryUsage() for a live, fully-accounted breakdown, or GET /admin/memory for the same payload in JSON:

cypher
CALL db.memoryUsage() YIELD total_bytes, payload_bytes, edge_pair_count, process_rss_bytes, breakdown RETURN total_bytes, payload_bytes, edge_pair_count, process_rss_bytes, breakdown
FieldWhat it measures
payload_bytesPure graph data — nodes, edges, properties. Lower bound.
total_bytesPayload + structural overhead (indirection, shards, stacks, slack). Aligns with RSS within a few percent on Linux.
breakdown.compact_store_bytesEdge adjacency — the dominant term at scale.
breakdown.pointer_indirection_bytesInternal NodeId ↔ external_id mapping.
breakdown.slot_table_bytesSlot allocator overhead for the chunk arena.
process_rss_bytesKernel-reported process RSS (what Docker and the OOM killer see).

Per-Pair Edge Compression

Each edge type stores adjacency with a payload variant chosen at registration. Pick the smallest variant that still answers your queries:

Variant~Bytes / pairKeepsUse for
Full (default) ~52 B tx_count, approx_sum, last_seen, 21-tick activity bitmap, 8-bin histogram, optional bool flag Event edges where you query velocity, amounts, and activity windows.
Slim ~32 B Same as Full minus the per-bucket histogram. High-cardinality event edges where histograms are not needed.
Static ~16 B value + last_seen only. No inverse index maintained. Structural edges (USES_DEVICE, SIMILAR_TO) where only existence or a score matters.

At 1 B edges, choosing Static over Full for a structural edge type saves ≈ 36 GiB of RSS — and avoids maintaining an inverse index that is never queried.

cypher — register a static edge type
CALL db.registerEdgeType({ name: "USES_DEVICE", from_node_type: "CARD", to_node_type: "DEVICE", is_static: true }) YIELD edge_type_id RETURN edge_type_id

Other In-Memory Compression

🗂️

Dictionary-Encoded Strings

Recurring string values (MCCs, country codes, merchant categories) are stored once and referenced by a small integer ID.

🔢

Bit-Packed Booleans

Boolean flags (fraud, verified, active) are packed into bitmaps rather than using one byte per value.

🌲

Adaptive Inverse Index

Each destination's inverse row starts as a sorted Vec<NodeId> and is promoted to a BTreeSet only when fan-in crosses 10 K — giving cold rows tight cache layout and hot rows O(log N) writes.

📦

Chunked Rows

Adjacency rows are split into fixed-size chunks so writers clone only the affected chunk (not the whole row) — shrinking write amplification by ~170× on hot source nodes.

On-Disk Compression

All snapshots, deltas, and checkpoints are written through zstd. Hot-path writers use level 1 for low CPU overhead; the offline compactor uses the ultra-fast -1 level.

ArtifactCodecTypical size vs raw payload
snapshot-<ts>.binzstd level 13–6× smaller
delta-<ts>-<seq>.binzstd level 14–8× smaller
checkpoint-<ts>.binzstd level -1Similar to snapshot — optimised for write speed, not ratio

Disk throughput is rarely the bottleneck — delta throughput is CPU-bound on the zstd encoder, not on disk I/O. On NVMe you can ingest 30 MB deltas every 30 seconds with RSS growth matching payload growth within a few percent.

Memory-Pressure Protection

JetGraph never exceeds its configured memory budget silently. memory_limit_bytes in config.toml is the engine's RSS soft limit (set it below the container's mem_limit). When RSS crosses the limit the engine takes one of two actions based on pressure_eviction_batch_size:

StrategySettingBehaviour
Backpressure (default) pressure_eviction_batch_size = 0 Blocks new ingest with 503 / RESOURCE_EXHAUSTED, writes an emergency snapshot, and resumes automatically once RSS drops. No data is lost.
Oldest-edge eviction pressure_eviction_batch_size = 50_000 Deletes the oldest edges across all types, proportional to per-type growth, until RSS falls below the limit. Ingest is not blocked — but old history is dropped.
⚠️
Never enable pressure eviction on a standby. A replica must only delete what arrives in the primary's delta stream; otherwise the two engines diverge.

Sizing Memory

Size memory_limit_bytes from Docker mem_limit, not from memswap_limit. Swap is an emergency cushion — paging graph pages catastrophically regresses query latency. Start at 70–75 % of mem_limit and only raise after seeing stable headroom in docker stats and db.memoryUsage.

Host RAMmem_limitPrimary memory_limit_bytes (~75%)Standby memory_limit_bytes (~70%)
16 GiB12g9_663_676_4169_019_432_960
32 GiB24g19_327_352_83218_038_865_920
64 GiB52g41_876_111_36039_084_369_920
96 GiB80g64_424_509_44060_129_542_144
300 GiB256g206_158_430_208192_414_534_860

jemalloc Tuning

JetGraph ships with jemalloc as the system allocator because glibc's default fragments badly under high-churn graph writes. Recommended MALLOC_CONF in the container environment:

yaml
environment: MALLOC_CONF: "background_thread:true,dirty_decay_ms:1000,muzzy_decay_ms:10000,narenas:16"
Do not add metadata_thp:auto or percpu_arena:percpu — both inflate jemalloc's internal metadata by 1–2 GiB and obscure the engine's real RSS. The recommended string is the tested production default.

Watching Memory Metrics

Scrape GET /metrics with any Prometheus-compatible collector:

Procedures Reference

All procedures are invoked via CALL procedure.name(args) YIELD col1, col2 RETURN ... over any connection (REST, Bolt, or gRPC).

Schema Procedures (db.*)

ProcedureArgumentsYieldsDescription
db.registerNodeType name, id_kind node_type_id Register a node type. id_kind is "string" or "integer".
db.registerEdgeType {name, from_node_type, to_node_type, …} edge_type_id Register an edge type. Optional fields include bin_boundaries, tracked_property, activity_bitmap.tick_size_secs, node_histogram, minimal_payload, bool_property, and symmetric.
db.registerProperty name, value_type property_id Register a named property. Types: "int", "float", "string", "bool", "timestamp".
db.finalizeSchema schema_version Lock the schema. Must be called before any data writes.
db.schema node_types, edge_types, properties Introspect the current schema definition.
db.nodeStats type, count O(1) node count per type — use instead of MATCH (n:TYPE) RETURN count(n).
db.memoryUsage total_bytes, payload_bytes, edge_pair_count, breakdown, process_rss_bytes, … Full in-process memory model: total_bytes sums payload and structural terms (indirection, stacks, slack) and should align with process_rss on Linux. payload_bytes is the schema data lower bound. High RSS at large scale reflects partitions × edge types × directions in the edge stores, not a discrepancy to "debug" away with allocator settings alone; reducing footprint means layout or partitioning changes.
db.resetGraph ok Wipe all data and schema. Only available when ENABLE_ADMIN_RESET=true.

Graph Procedures (graph.*)

ProcedureDescription
graph.ingest(nodes, edges) Batch upsert of nodes and edges in a single round-trip. Returns ok, nodes_created, edges_created.
graph.upsertEdge(edge_type, src_typed_id, dst_typed_id) O(1) atomic edge upsert — increments tx_count and updates activity bitmap. Returns created_new, tx_count, approx_sum.
graph.edgeState(src, dst, edge_type, windows?) Returns one edge pair's state: tx_count, approx_sum, last_seen, bool_flag, and activity_counts. The optional windows list is expressed in activity ticks.
graph.lastNeighbor(node_typed_id, edge_type) Returns the most recently seen neighbor — useful for impossible-travel and last-location detection.
graph.fraudContext(node_typed_id) Returns connected fraud nodes, max neighbor fraud score, and propagation depth.
graph.flagNode(node_typed_id, score, reason) Mark a node as fraudulent and propagate the score to neighbors.
graph.unflagNode(node_typed_id) Remove the fraud flag from a node.
graph.histogram(node, edge_type, hours?, days?) Returns node-level histogram buckets and aggregated counts. Pass hours to use the hourly ring, or null, days to use the daily ring.
graph.featureVector(node, edge_types, hours?, days?) Returns a compact flat vector of neighbour counts per edge type. For richer structured ML features, combine graph.histogram, graph.edgeState, or use the REST /features/vector endpoint.
graph.findSimilar(node_typed_id, k) Jaccard-based k-nearest-neighbor lookup using the SIMILAR_TO edge index.
graph.buildSimilarityGraph(edge_type, k) Batch-compute similarity edges for all nodes of a given edge type.
graph.deleteNode(node_typed_id) Delete a node and detach all its relationships.
graph.clearEdgeTypeData(edge_type) Remove all edges of a given type without touching nodes.
graph.saveSnapshot() Trigger a full snapshot immediately. Returns ok, path.

Edge Type Variants

When registering edge types, the storage variant is determined by registration fields. Choose the smallest payload that still answers your feature queries:

VariantTriggerPhysical payloadFeaturesWhen to use
Full / numeric bin_boundaries present CompactEdgePayload (~36 B payload) tx_count, approx_sum, last_seen, 21-tick activity bitmap, 8 numeric bins, optional bool flag Transaction edges where you query velocity, amounts, and amount buckets.
Slim No bin_boundaries, not minimal SlimEdgePayload (~16 B payload) tx_count, last_seen, 21-tick activity bitmap, optional bool flag. No amount bins or approx_sum. High-cardinality event edges where you need counts/velocity but not amount histograms.
Static minimal_payload: true StaticEdgePayload (~8 B payload) value + last_seen only Structural or derived-score edges such as SIMILAR_TO.
cypher — register a static edge type
// Static edges use an ~8 B payload — ideal for derived scores and similarity links CALL db.registerEdgeType({ name: "SIMILAR_TO", from_node_type: "CARD", to_node_type: "CARD", minimal_payload: true, symmetric: true }) YIELD edge_type_id RETURN edge_type_id

Metrics

JetGraph exposes a Prometheus-compatible metrics endpoint at GET /metrics on port 8080. Scrape it with any Prometheus-compatible collector or read it directly:

bash
curl http://localhost:8080/metrics
MetricDescription
jetgraph_ingest_totalTotal ingest transactions processed
jetgraph_cypher_queries_totalTotal Cypher queries executed (HTTP + Bolt + gRPC)
jetgraph_grpc_requests_totalTotal gRPC calls by method
jetgraph_rcu_retries_totalRCU (read-copy-update) retries on the compact neighbor store — high values indicate write contention
jetgraph_memory_pressure1 when RSS exceeds memory_limit_bytes, 0 otherwise
jetgraph_inverse_lock_contentions_totalLock contention on the inverse neighbor index

Segment Evaluator

The Segment Evaluator is a sidecar service that sits in front of the graph engine's ingest path. It evaluates configurable segment rules in real time on every transaction — automatically assigning entities to segments like "High Velocity", "New Merchant Risk", or "Fraud Ring Adjacent" based on graph signals.

ℹ️
The Segment Evaluator runs as a separate container (segment-evaluator) in the stack. It connects to the graph engine over gRPC and exposes its own HTTP API on port 8081.

How It Works

Key API Endpoints (port 8081)

MethodPathDescription
POST/ingest/evaluateIngest a transaction and re-evaluate segments for the entities involved
GET/segmentsList all defined segments
GET/segments/:name/membersList all current members of a segment
POST/segments/simulateDry-run: evaluate signals for an entity without writing segment membership
POST/segments/sweepRe-evaluate all entities against all segments (useful after rule changes)
GET/segments/signalsList all configured signals
POST/segments/node/lookupLook up which segments a specific entity belongs to
GET/POST/DELETE/config/signalsManage signal definitions
GET/POST/DELETE/config/segmentsManage segment definitions
POST/segments/config/reloadHot-reload config from TOML files without restarting
GET/healthHealth check

Querying Segments via Cypher

Once segments are assigned, you can query them like any other graph relationship:

cypher
// Find all cards in the "High Velocity" segment MATCH (c:CARD)-[:MEMBER_OF]->(s:Segment) WHERE s.external_id = "High Velocity" RETURN c.external_id AS card LIMIT 100

Pattern Miner

The Pattern Miner is a sidecar service that watches the graph engine's edge stream in real time and builds behavioral transition patterns — sequences of entities a node visited over time. These patterns power next-location prediction, impossible-travel detection, and anomaly scoring.

ℹ️
The Pattern Miner runs as a separate container (pattern-miner) and exposes its API on port 8082. It subscribes to the graph engine's WatchEdgeUpserts CDC stream over gRPC.

How It Works

Key API Endpoints (port 8082)

MethodPathDescription
GET/patterns/transitionsList all tracked transition pairs across all rules
GET/patterns/predict/:node_idPredict the most likely next entity for a given node based on historical transitions
GET/patterns/path/:node_idReturn the full transition path (sequence of entities) for a node
GET/patterns/context/:m1/:m2Return the transition context between two specific entities
GET/POST/config/rulesList or create pattern rules (which edge type to watch)
DELETE/config/rules/:watch_edgeRemove a rule
GET/POST/config/settingsManage global miner settings
POST/patterns/config/reloadHot-reload rules from TOML without restarting
GET/healthHealth check

Impossible-Travel Detection Example

bash
# Get the transition path for a card (sequence of IPs it has used) curl http://localhost:8082/patterns/path/CARD:card-001 # Predict the next likely IP for this card curl http://localhost:8082/patterns/predict/CARD:card-001

Code Examples Reference

Copy-paste-ready snippets for common operations.

REST API — Full Workflow

bash — full REST workflow
BASE=http://localhost:8080 ### 1. Register schema curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerNodeType(\"CARD\",\"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}' curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerNodeType(\"MERCHANT\",\"string\") YIELD node_type_id RETURN node_type_id","parameters":{}}' curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"CALL db.registerEdgeType({name:\"TRANSACTS_AT\",from_node_type:\"CARD\",to_node_type:\"MERCHANT\"}) YIELD edge_type_id RETURN edge_type_id","parameters":{}}' curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"CALL db.finalizeSchema() YIELD schema_version RETURN schema_version","parameters":{}}' ### 2. Create nodes curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"CREATE (c:CARD {external_id:$id}) RETURN c.external_id","parameters":{"id":"card-001"}}' curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"CREATE (m:MERCHANT {external_id:$id}) RETURN m.external_id","parameters":{"id":"merchant-42"}}' ### 3. Create relationship curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"MATCH (c:CARD {external_id:$cid}),(m:MERCHANT {external_id:$mid}) CREATE (c)-[:TRANSACTS_AT {amount:$amt}]->(m) RETURN true","parameters":{"cid":"card-001","mid":"merchant-42","amt":49.99}}' ### 4. Query the graph curl -sS -X POST $BASE/cypher -H 'Content-Type: application/json' \ -d '{"query":"MATCH (c:CARD {external_id:$id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant","parameters":{"id":"card-001"}}'

Bolt — Python

python
from neo4j import GraphDatabase driver = GraphDatabase.driver("bolt://localhost:7687", auth=("", "")) def get_merchants_for_card(tx, card_id): result = tx.run( "MATCH (c:CARD {external_id: $id})-[:TRANSACTS_AT]->(m:MERCHANT) RETURN m.external_id AS merchant", id=card_id ) return [r["merchant"] for r in result] with driver.session() as session: merchants = session.execute_read(get_merchants_for_card, "card-001") print(merchants) driver.close()

Rust Client — Complete Scoring Loop

rust
use jetgraph_client::{ GraphClient, CreateEdgeRequest, VelocityQuery, FraudContextQuery, FlagRequest, prop, }; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let graph = GraphClient::connect("http://localhost:50051").await?; let card_id = graph.lookup_node("CARD", "card-001").await?; let merchant_id = graph.lookup_node("MERCHANT", "merchant-42").await?; let txn_1h = graph.get_velocity_count(VelocityQuery { node: card_id, edge_type: "TRANSACTS_AT".into(), window_secs: 3600, }).await?.count; let is_new = !graph.edge_exists(card_id, merchant_id, "TRANSACTS_AT").await?; let ctx = graph.get_fraud_context(FraudContextQuery { node: card_id }).await?; let mut risk: f32 = 0.0; if is_new { risk += 0.15; } if txn_1h > 30 { risk += 0.25; } risk += 0.5 * ctx.max_neighbor_fraud_score; graph.create_edge(CreateEdgeRequest { edge_type_name: "TRANSACTS_AT".into(), src: card_id, dst: merchant_id, properties: vec![prop("risk", risk)], }).await?; println!("Risk score: {:.2}", risk); Ok(()) }