Skip to content

Collected Metrics

Upgrading from a previous version?

If you are updating dashboards or alerts after an upgrade, see Upgrading for migration steps and examples.

We expose a large number of JVM and application metrics via the DropWizard Metrics library.

They can be exported by periodically writing as CSV files, logging, to InfluxDB, and/or via JMX. By default only the JMX reporter is enabled. See the comments on the metrics-reporters setting in the Config Ref Manual for how to enable / configure the others - i.e. the part on one of [jmx, csv, influxdb, slf4j]. Some metrics are also exposed in JSON on the HTTP endpoint Metrics: GET /api/v2/system/metrics.

Available Metrics

The metrics that we explicitly measure in our code are as follows.

  • quine
    • shard.shard-{n}
      • sleep-counters: Counters that track the sleep cycle (in aggregate) of nodes on the shard
        • removed
        • slept-failure
        • slept-success
        • woken
      • sleep-timers: Timers that measure the duration of sleep and wake operations on nodes
        • slept
        • woken
      • nodes-evicted: Meter tracking node evictions from memory (only emitted when enableDebugMetrics is set)
      • unlikely: Counters that track occurrences of supposedly unlikely (and generally bad) code paths
        • wake-up-failed: Despite repeated attempts, we cannot wakeup the requested node.
        • wake-up-error: An unexpected error was encountered when attempting to wake up a node; will retry.
        • hard-limit-reached: A node was blocked from being woken up because the hard limit for number of active nodes has been hit; will retry.
        • actor-name-reserved
        • incomplete-shutdown: A shard did not complete shutdown cleanly.
    • node: Bucketed counters
      • edge-counts: A counter for the numbers of edges on nodes, split into buckets
        • 1-7
        • 8-127
        • 128-2047
        • 2048-16383
        • 16384-infinity
      • property-counts: A counter for the numbers of properties on nodes, split into buckets
        • 1-7
        • 8-127
        • 128-2047
        • 2048-16383
        • 16384-infinity
      • property-sizes: A histogram of property sizes (in bytes) observed since startup
    • ingest.{ingest-name}
      • count: Number of records ingested
      • bytes: Number of bytes ingested (aggregate data payload size)
      • query: Timer measuring the duration of ingest query executions
      • deserialization: Timer measuring the duration of ingest record deserialization
    • standing-queries
      • results.{standing-query-name}: Meter of results that were produced for a named standing query on this member
      • dropped.{standing-query-name}: Counter of results that were dropped for a named standing query on this member due to an excess of messages already in-flight when the standing query backpressures. This should be zero.
      • states.{standing-query-id}: Histogram of the size (in bytes) of persistent standing query states.
      • queue-time.{standing-query-name}: Timer measuring how long SQ results spend in the result queue before being accepted for processing
  • persistor: All are timers, except snapshot-sizes, which is a histogram.
    • get-journal: Measures how long it takes to query a node's journal from the persistor
    • get-latest-snapshot: Measures how long it takes to retrieve a node's snapshot from the persistor
    • persist-event: Measures how long it takes to persist a change to a node's state.
    • persist-snapshot: Measures how long it takes to persist a node's snapshot.
    • set-standing-query-state: Measures how long it takes to persist standing query state.
    • get-standing-query-states: Measures how long it takes to retrieve standing query states.
    • snapshot-sizes: A histogram that measures the serialized size (in bytes) of a node's persisted snapshot.
  • shard.shard-{n}
    • delivery-relay-deduplicated: Counter of deduplicated message deliveries on this shard.
  • shared
    • valve.{name}: A gauge representing how many operations are currently pausing an ingest due to backpressuring.
  • cache
    • {context}.insert: Timer tracking insert operations into internal caches (e.g. ingest-XYZ-deduplication, http-webpage-serve).
  • node
    • mailbox-sizes: A counter for the sizes of message mailboxes on nodes, split into buckets
      • 1-7
      • 8-127
      • 128-2047
      • 2048-16383
      • 16384-infinity
  • dgn-reg
    • count: Gauge measuring the number of in-memory registered DomainGraphNodes.

Other libraries we use also export metrics via this mechanism - e.g. the Cassandra client reports metrics relating to the usage of the Cassandra server, which can optionally be enabled in your config file: https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/metrics/#enabling-specific-driver-metrics.