Persistors¶
The graph operates in memory, but saves its data to disk. Because data is durably stored, Quine does not need to define any time-windows for matching up data in memory. This data is managed automatically so that it is transparent to the operation of the graph, and saved in a way that is fast for streaming data.
Quine's primary unit of data is the graph node. A node is defined by its ID and serves as a collection of properties and edges. Changes to the collection of properties and edges are what is saved to disk. These changes, sometimes called "deltas" are very small units of data added to an append-only log using the strategy known as event-sourcing. The persistor is the agent that stores and retrieves event-sourced data.
The format of data saved on disk is conceptually a key-value pair, where the key is a node ID and the value is the append-only log of changes to the properties and edges of that node. While this is conceptually the storage format, in practice the choice of data storage medium can affect the actual format stored on disk. For instance, when using Cassandra as the backing store for the persistor, the key is a compound value of both the Node ID ("QuineID") and a unique timestamp. (Note: timestamps in Quine are a careful combination of human-readable wall-clock time and several variations of logical time.)
Quine Persistence Event Configuration¶
Related to the persistence store configuration, the persistence section of our config has settings related to when to save data.
# configuration for which data to save about nodes and when to do so
persistence {
# whether to log updates in between writing node snapshots. Without
# this, data may be lost in the event of a crash, and historical
# queries will have less fine-grained history to work with, but
# performance will be greater, and disk usage will be less.
journal-enabled = true
# one of [on-node-sleep, on-node-update, never]. When to save a
# snapshot of a node's current state, including any SingleId Standing
# Queries registered on the node
snapshot-schedule = on-node-sleep
# whether only a single snapshot should be retained per-node. If false,
# one snapshot will be saved at each timestamp against which a
# historical query is made
snapshot-singleton = false
# when to save Standing Query partial result (only applies for the
# `MultipleValues` mode -- `SingleId` Standing Queries always save when
# a node saves a snapshot, regardless of this setting)
standing-query-schedule = on-node-sleep
}
Local Persistors¶
RocksDB¶
RocksDB is a widely used implementation of a log-structured merge tree (LSMT). It is an ideal data store for Quine and is the default option for data storage locally.
Using RocksDB in Quine requires no special changes to use it, since it is the default. The setting that would choose RocksDB specifically is:
quine.store.type=rocks-db
Warning
RocksDB is distributed by its authors as a binary artifact built for specific architectures and used from JVM applications through the Java Native Interface (JNI). If you try to start Quine with the default settings on an unsupported platform, you will get an error suggesting that you run with the option to use MapDB instead: -Dquine.store.type=map-db
.
MapDB¶
MapDB is an embedded database engine for the JVM that uses memory-mapped files. Since MapDB is written in a JVM language, it is included in Quine without any binary dependencies built for specific architectures. This makes MapDB the most portable option for a Persistor's data store.
MapDB does have some other limitations though. Memory mapped files are generally limited to 2GB in size. With MapDB, memory mapped files larger than 2GB will become very slow to use. Quine supports sharding the storage of a MapDB persistor into multiple files to work around this limitation. But even if sharded, memory mapped files will cause Quine to use off-heap memory, which in extreme circumstances can cause the process to use large amounts of RAM or lead to the operating system killing the process.
To use MapDB, set the following configuration setting:
quine.store.type=map-db
Remote Persistors¶
Cassandra¶
Apache Cassandra is a distributed NoSQL database. It is highly configurable and trusted by enterprise organizations around the world to manage very large amounts of data. Cassandra is an ideal data storage mechanism for a Quine persistor. Using Cassandra, Quine instances can achieve extremely high throughput, high-availability, data replication, and failover strategies needed for production operation in the enterprise.
The Cassandra persistence config also connects Quine to Cassandra compatible solutions like AstraDB and ScyllaDB. See the Cassandra Setup page for details on setting up and using Cassandra with Quine.
Migration¶
Each version of Quine is associated with a persistence version. The persistence version identifies the conventions used to map the graph to the model of underlying persistor. Before it can become operational, Quine needs to modify the persisted data to match the conventions of the persistence version it will be using.
When possible, this will be done automatically. If manual intervention is required, a message will be logged describing what must be done to enable migrating persistence versions.
Persistence versions can only move from a lower version to a higher one. If there is more than one version between what is stored and what needs to be used by the running Quine instance, multiple migrations may be run in succession. The app version may advance without changing the persistence version.
Backup and Export¶
Backup and export are delegated to the tools of the underlying persistor (e.g. Cassandra Backups).
Shutting down before backing up is a simple way to ensure consistency, at the cost of downtime