Diagnosing Bottlenecks¶

Quine is a fully backpressured system. When one component can't keep up, it slows down upstream components rather than dropping data. This makes the system resilient, but it also means that a bottleneck in one area can manifest as slowness elsewhere.

This guide helps you identify where bottlenecks are occurring so you can focus optimization efforts effectively.

Common Symptoms¶

Symptom	Possible Bottleneck
Low ingest rate despite available CPU	Standing queries or persistor
High CPU with low ingest rate	Inefficient queries or supernodes
Ingest rate drops periodically	Standing query backpressure
Standing query results delayed or dropped	Output destination or result queue overflow

Key Metrics for Diagnosis¶

Ingest Rate¶

Metrics: ingest.{name}.count, ingest.{name}.bytes

The ingest rate shows how many records per second are being processed. Low ingest rates can have many causes, so use other metrics to narrow down the bottleneck.

Note: Ingest rate is reported as an exponentially weighted moving average, which can be volatile at the beginning and end of a stream. Allow the rate to stabilize for at least 10 minutes before drawing conclusions.

Standing Query Backpressure Valve¶

Metric: shared.valve.ingest

This is a key diagnostic metric. When this gauge shows a non-zero value, it means ingest is being paused because the standing query result queue is filling up faster than results can be processed and delivered.

Results flow from the queue through the output query and then to the destination. The most common cause of backpressure is output queries that need optimization. If the output query performs expensive operations (such as additional graph traversals or lookups), it can become a bottleneck. The second most common cause is destination performance, including slow network connections, rate-limited APIs, or destinations at capacity.

Persistor Latency¶

Metrics: persistor.{query-type} timers

These metrics track how long persistence operations take. Key statistics to watch:

avg (average): If high across all query types, indicates a general persistor bottleneck
p95 (95th percentile): If p95 is high but avg is low, indicates occasional problematic queries, often caused by supernodes
p99 (99th percentile): Can reveal rare but severe issues that p95 misses

When persistor latency is the bottleneck, the cause is typically either I/O bound (disk throughput) or compute bound (CPU/memory on the persistor hosts).

Edge Count Histogram¶

Metrics: node.edge-counts.{bucket}

High counts in the larger buckets (2048-16383 or 16384-infinity) indicate the presence of supernodes. Supernodes are not inherently problematic, but they can cause performance issues in queries that traverse them.

Resource Utilization¶

Resource metrics (CPU, memory, network) must be measured externally to Quine. See Operational Considerations for detailed guidance on resource planning. In general:

High CPU utilization is normal and indicates good resource usage
Low CPU utilization with low ingest rates suggests the bottleneck is elsewhere

Identifying the Bottleneck¶

Step 1: Check Standing Query Backpressure¶

Start by checking shared.valve.ingest. If this metric shows non-zero values, standing query outputs are causing backpressure on ingest.

Next steps: First, review output query complexity and optimize any expensive operations. Second, check output destination throughput and capacity.

Step 2: Check Persistor Latency¶

If standing query backpressure is not the issue, check persistor latency metrics.

High average latency across all operations: The persistor is generally overloaded. Consider:

Adding persistor resources
Reviewing persistor configuration (journaling, snapshot settings)
Checking persistor host disk I/O and CPU

High p95/p99 but normal average: Occasional operations are slow, often due to supernodes. Check the edge count histogram for confirmation.

Step 3: Check Resource Utilization¶

If neither standing queries nor the persistor appear to be the bottleneck:

Low CPU on Quine hosts: Check network throughput.
High CPU on Quine hosts: Review ingest query efficiency. Queries that don't anchor by ID cause expensive all-node scans. See Troubleshooting Queries for detailed query debugging techniques.

Step 4: Check for Supernodes¶

If the edge count histogram shows significant counts in the high buckets, supernodes may be impacting performance. Supernodes affect:

Query performance when traversing edges
Persistor performance when reading/writing node state
Memory usage for caching node state

Quine Enterprise

Quine Enterprise includes supernode mitigation capabilities for production deployments. Compare editions.

Quick Reference¶

Metric	Normal	Indicates Problem
`shared.valve.ingest`	0	Non-zero values indicate SQ output backpressure
`persistor.*.avg`	< 10ms	> 50ms suggests persistor bottleneck
`persistor.*.p95`	Similar to avg	Much higher than avg suggests supernodes
`node.edge-counts.16384-infinity`	0 or low	High values indicate supernodes
`standing-queries.dropped.{name}`	0	Non-zero means results are being lost