Common Pitfalls¶

Quine is a streaming graph — not a database. Many patterns that work in traditional databases or graph databases like Neo4j will silently produce wrong results, lose data, or cause severe performance problems in Quine. Read this page before writing your first ingest query.

For Cypher-specific differences from Neo4j, see Quine Cypher vs. Neo4j Cypher.

Ingest Queries¶

Every query must anchor nodes by ID¶

Any query that finds nodes by property instead of by ID triggers a full scan of every node in the graph. This includes MERGE with property matchers.

-- WRONG: scans all nodes
MATCH (user) WHERE user.email = "alice@example.com" ...

-- CORRECT: direct lookup
MATCH (user) WHERE id(user) = idFrom("user", "alice@example.com") ...

Details → Troubleshooting Ingest

Mismatched idFrom() arguments silently orphan data¶

If one ingest stream uses idFrom("order", orderId) and another uses idFrom(orderId) without the prefix, they address different nodes. Relationships between them will never form, and there is no error message.

Details → Troubleshooting Ingest

Ingest queries run in parallel — don't depend on other records¶

Each record is processed independently and concurrently. A query that assumes another record has already been ingested (e.g., looking up a customer node by property to attach an order) will silently skip records when the dependency hasn't been processed yet.

Solution: Make each ingest query self-contained. Every record should create all the nodes and edges it references, using idFrom() to address them. Order of arrival should not matter.

Details → Troubleshooting Ingest

Ingest queries should be idempotent¶

At-least-once delivery means records may be processed more than once after a restart. Design queries so that processing the same record twice produces the same graph state.

Details → Delivery Guarantees

Standing Queries¶

Distinct ID mode fires once per root node, not per match¶

In the default Distinct ID mode (DISTINCT_ID, v1: DistinctId), once a pattern matches for a given root node, additional matches from that same root node do not emit new results. If you need a result for every match, use Multiple Values mode (MULTIPLE_VALUES, v1: MultipleValues).

Details → Standing Query Modes

Results are best-effort and can be dropped¶

Standing query outputs are not durably queued. Results can be lost if the output queue overflows, an output destination fails, or the process restarts. Treat standing query outputs as notifications, not authoritative records.

Details → Delivery Guarantees

New standing queries fire on existing data¶

When you register a standing query, it evaluates against all data already in the graph — not just new data arriving after registration. This can produce a burst of results from historical data.

Details → Troubleshooting Queries

Exploration and Ad-Hoc Queries¶

Sample queries and query bar queries need ID anchoring too¶

The "no indexes" rule applies to all queries, not just ingest. Sample queries in recipes and queries typed into the Exploration UI query bar are ad-hoc queries — they scan the full graph unless anchored by ID.

-- WRONG: scans all nodes (labels are not indexed)
MATCH (p:Person) RETURN p

-- WRONG: scans all nodes looking for property match
MATCH (n) WHERE n.type = "order" RETURN n LIMIT 10

-- CORRECT: look up a specific node by computed ID
MATCH (n) WHERE id(n) = idFrom("customer", "CUST-123") RETURN n

-- CORRECT: when no specific node is known, sample from recently accessed nodes
CALL recentNodes(10)

-- CORRECT: sample recent nodes filtered by label
CALL recentNodes(1000) YIELD node AS nId
MATCH (n)
WHERE id(n) = nId AND labels(n) = ["Person"]
RETURN n

Always use idFrom() when the identity of the target node is known. When no specific node is known, use recentNodes() or recentNodeIds() to sample from recently accessed nodes. Never rely on label or property scans — Quine has no indexes, so MATCH (n:Label) is always a full graph scan.

Details → Quine Indexing

Quick queries are already node-anchored¶

Quick queries (right-click context menu on a node) receive the clicked node bound to the variable n, so they are already anchored to a specific starting point. The Cypher in a quick query should expand outward from n — not scan for unrelated nodes.

-- Good quick query: expand from the clicked node
MATCH (n)-[:PURCHASED]->(order) RETURN order

-- Bad quick query: ignores the starting node, scans the graph
MATCH (order:Order) RETURN order LIMIT 10

Performance¶

Counting nodes is expensive¶

MATCH (n) RETURN count(*) scans the entire graph. Unlike a database, Quine does not maintain a running node count.

Details → Streaming Graph vs. Database

Labels do not improve query performance¶

In Neo4j, MATCH (p:Person) uses a label index. In Quine, labels are stored as properties with no index — filtering by label still requires scanning all nodes.

Details → Cypher Differences

Supernodes degrade traversal performance¶

Nodes with thousands of edges (supernodes) cause performance problems when queries traverse outward from them. Supernodes that are only traversed to (as endpoints) are fine. Consider using properties instead of edges for high-cardinality relationships, or partitioning supernodes by time period.

Details → Diagnosing Bottlenecks

Default parallelism may not be optimal¶

Ingest parallelism defaults to 16. The right value depends on your data, queries, and infrastructure. Experimenting with this value can significantly improve throughput. When running multiple ingests on the same host, divide the optimal single-ingest parallelism across them.

Details → Troubleshooting Ingest

Standing query backpressure slows ingest¶

If standing queries can't keep up with ingest, Quine pauses ingest to prevent result loss. Monitor the shared.valve.ingest metric — a non-zero value means ingest is being throttled.

Details → Diagnosing Bottlenecks

Operations¶

Graceful shutdown is required to prevent data loss¶

Use the POST /api/v2/system:shutdown endpoint to shut down cleanly. A hard kill (SIGKILL, container eviction) can lose data that hasn't been persisted yet.

Details → Operational Considerations

JVM heap should not exceed 16GB¶

Garbage collection pauses grow significantly above 16GB heap. 12GB is a good starting point. Additional physical memory beyond the heap is needed for off-heap overhead — budget 25–33% extra.

Details → Configuration

Recipes use temporary storage by default¶

When Quine launches a recipe, it creates a temporary data store in the system temp directory. Each subsequent launch replaces the previous data. Use --force-config with a persistent data path to retain data between runs.

Details → Recipes Tutorial