Ethereum Tag Propagation
Full Recipe¶
Shared by: Ethan Bell
This recipe models data on the thoroughgoing Ethereum blockchain. Any transaction can be flagged as tainted causing a tainted
tag to propagate into the graph to track the flow of transactions from the flagged and tainted accounts.
Ethereum Tag Propagation Recipe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|
Scenario¶
Newly-mined Ethereum transaction metadata is imported via a Server-Sent Events data source. Transactions are grouped by the block in which they were mined then imported into the graph. Each wallet address is represented by a node, linked by an edge to each transaction sent or received by that account, and linked by an edge to any blocks mined by that account. Quick queries allow marking an account as "tainted". The tainted flag is propagated along outgoing transaction paths via Standing Queries to record the least degree of separation between a tainted source and an account receiving a transaction.
Note
The Ethereum diamond logo is property of the Ethereum Foundation, used under the terms of the Creative Commons Attribution 3.0 License.
Sample Data¶
Sample data is continuously sampled from the Ethereum block chain and emitted as a server sent event for use in this demo.
How it Works¶
The recipe installs two ingest queries. They are auto-named INGEST-1
and INGEST-2
. The INGEST-1
query processes blocks, and INGEST-2
processes mined transactions. In both queries, idFrom is used to identify nodes from unique identifiers present in the dataset. For accounts, the address is the identifier; for blocks, the block hash is the identifier; etc. Ethereum data uses hexadecimal strings for identifiers, sometimes with a built-in capitalization checksum. This means the address 0x19975E29111a6c85E282eBe409C272c15492c6Ad
is the same address as 0x19975e29111a6c85e282ebe409c272c15492c6ad
, just written slightly differently. To account for these variations in the hex representation's capitalization, before resolving an id, toLower
is used to convert the identifier to consistent lower-case representation.
INGEST-1¶
The INGEST-1 query processes streaming data for block_head
like:
id: 14566607_head
event: block_head
data: {
"number": 14566607,
"hash": "0xf3dafdda16a884f6ff2b1b0c0325eaadc70db022363e3af74ab5994f8cbc1f12",
"parentHash": "0xcd859249e97684f319173c284314307a11deaa2a708c8c5fcf377971e09abb01",
"sha3Uncles": "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
"logsBloom": "0x0",
"transactionsRoot": "0xa77b91fc4ee74bc1df28019e898a4ba17dd87fcc41c633cab25b4909ee56a60a",
"stateRoot": "0xf7869b706a212bfa504520674c3ef3350b187d31ef207b155fa548a4e59169df",
"receiptsRoot": "0x6669147c87b5cc857801372bed55ab6ddf3474d935b2b4e3b1ee1b95f4dc357b",
"miner": "0x829BD824B016326A401d083B33D092293333A830",
"difficulty": "13384256520560135",
"extraData": "0xe4b883e5bda9e7a59ee4bb99e9b1bc4a1621",
"gasLimit": 30029295,
"gasUsed": 3117128,
"timestamp": 1649710008,
"baseFeePerGas": "0xfcf7d67a0",
"nonce": "0xc1a22f3db05412ca",
"mixHash": "0xfaafcc9e2be300ba795954bed57a38e415330e6131e48e58770e8e678a16e869"
}
The ingest query identifies (BA)
, (minerAcc)
, (blk)
, and (parentBlk)
nodes and loads them into the graph.
- format:
query: |-
MATCH (BA), (minerAcc), (blk), (parentBlk)
WHERE
id(blk) = idFrom('block', toLower($that.hash))
AND id(parentBlk) = idFrom('block', toLower($that.parentHash))
AND id(BA) = idFrom('block_assoc', toLower($that.hash))
AND id(minerAcc) = idFrom('account', toLower($that.miner))
CREATE
(minerAcc)<-[:mined_by]-(blk)-[:header_for]->(BA),
(blk)-[:preceded_by]->(parentBlk)
SET
BA:block_assoc,
BA.number = $that.number,
BA.hash = $that.hash,
blk:block,
blk = $that,
minerAcc:account,
minerAcc.address = $that.miner
type: CypherJson
url: https://ethereum.demo.thatdot.com/blocks_head
type: ServerSentEventsIngest
{
"format": {
"query": "MATCH (BA), (minerAcc), (blk), (parentBlk)\nWHERE\n id(blk) = idFrom('block', toLower($that.hash))\n AND id(parentBlk) = idFrom('block', toLower($that.parentHash))\n AND id(BA) = idFrom('block_assoc', toLower($that.hash))\n AND id(minerAcc) = idFrom('account', toLower($that.miner))\nCREATE\n (minerAcc)<-[:mined_by]-(blk)-[:header_for]->(BA),\n (blk)-[:preceded_by]->(parentBlk)\nSET\n BA:block_assoc,\n BA.number = $that.number,\n BA.hash = $that.hash,\n blk:block,\n blk = $that,\n minerAcc:account,\n minerAcc.address = $that.miner",
"type": "CypherJson"
},
"url": "https://ethereum.demo.thatdot.com/blocks_head",
"type": "ServerSentEventsIngest"
}
INGEST-2¶
The INGEST-2 query receives tx_mined
events like:
id: 14566637: 0
event: tx_mined
data: {
"blockHash": "0x0d7782556aef00f1391a05a18ab229a70720780fe3c92eaff74738dee59649d0",
"blockNumber": 14566637,
"from": "0x19975E29111a6c85E282eBe409C272c15492c6Ad",
"gas": 42105,
"gasPrice": "203940950410",
"hash": "0x470294af9453f2cd1ec084456328da5c613585974e838fa088cef27246b2481e",
"input": "0x",
"nonce": 1,
"r": "0x8b52f40f28db1627e82fea7352f6d2ba1133dcac081b6939bd03ff397370586d",
"s": "0x8e7b2c69b1684873156090f238d42aad2c14315a08551a57dc5ed1aa45f0a76",
"to": "0x732Ec041e4Dc8c01B541B237dE5Ce794c51cF838",
"transactionIndex": 0,
"type": "0x0",
"v": "0x26",
"value": "168930787638413525"
}
The ingest query identifies (BA)
, (toAcc)
, (fromAcc)
, and (tx)
and loads them into the graph.
- format:
query: |-
WITH true AS validTransactionRecord WHERE $that.to IS NOT NULL AND $that.from IS NOT NULL
MATCH (BA), (toAcc), (fromAcc), (tx)
WHERE
id(BA) = idFrom('block_assoc', toLower($that.blockHash))
AND id(toAcc) = idFrom('account', toLower($that.to))
AND id(fromAcc) = idFrom('account', toLower($that.from))
AND id(tx) = idFrom('transaction', toLower($that.hash))
CREATE
(tx)-[:defined_in]->(BA),
(tx)-[:from]->(fromAcc),
(tx)-[:to]->(toAcc)
SET
tx:transaction,
BA:block_assoc,
toAcc:account,
fromAcc:account,
tx = $that,
fromAcc.address = $that.from,
toAcc.address = $that.to
type: CypherJson
url: https://ethereum.demo.thatdot.com/mined_transactions
type: ServerSentEventsIngest
{
"format": {
"query": "WITH true AS validTransactionRecord WHERE $that.to IS NOT NULL AND $that.from IS NOT NULL\nMATCH (BA), (toAcc), (fromAcc), (tx)\nWHERE\n id(BA) = idFrom('block_assoc', toLower($that.blockHash))\n AND id(toAcc) = idFrom('account', toLower($that.to))\n AND id(fromAcc) = idFrom('account', toLower($that.from))\n AND id(tx) = idFrom('transaction', toLower($that.hash))\nCREATE\n (tx)-[:defined_in]->(BA),\n (tx)-[:from]->(fromAcc),\n (tx)-[:to]->(toAcc)\nSET\n tx:transaction,\n BA:block_assoc,\n toAcc:account,\n fromAcc:account,\n tx = $that,\n fromAcc.address = $that.from,\n toAcc.address = $that.to",
"type": "CypherJson"
},
"url": "https://ethereum.demo.thatdot.com/mined_transactions",
"type": "ServerSentEventsIngest"
}
Running the Recipe¶
❯ java -jar quine-1.7.3.jar -r ethereum.yaml
Graph is ready
Running Recipe: Ethereum Tag Propagation
Using 6 node appearances
Using 7 quick queries
Using 2 sample queries
Running Standing Query STANDING-1
Running Ingest Stream INGEST-1
Running Ingest Stream INGEST-2
Quine web server available at http://localhost:8080
Observe that Quine is running in the terminal window and that the ingest queries are receiving data.
| => STANDING-1 count 0
| => INGEST-1 status is running and ingested 485
| => INGEST-2 status is running and ingested 34820
Reviewing chains¶
The nodes appearing in your graph are from the live Ethereum blockchain. They will continue to stream in as long as Quine is running the recipe.
Start exploring the graph by pulling a few recent blocks from the blockchain with the Recently Accessed Blocks
sample query. Select the sample query in the query bar then click the Query button. The query returns a sub-graph of the recent blocks ordered by the block that preceded it.
Take a moment to inspect a couple of the blocks to see the data stored as parameters.
Click back into the query bar and clear the query then submit the Sent and Received ETH
sample query to see accounts that have sent and received transactions.
This query finds a series of Wei transactions chained from account to account. Arrange the graph so that you can see all of the nodes. Right-click on the node at the head of the chain and select "Outgoing Transactions" to create a synthetic edge between the accounts. Create a second synthetic edge between the second and third accounts.
Tip
Hold shift while moving a node to lock it's position in place.
Taint a Node¶
Right-click on the origin node again and select "Mark as Tainted." This adds a tainted
parameter tag to the node and sets it to a value of 0. A node with tainted=0
indicates that this is the source of taint in our graph.
Notice that you begin to receive updates in the terminal window where you launched Quine from. The Standing Query produces these notices from the recipe; let's look at it now.
A Standing Query is composed of two parts, the pattern query that detects a sub-graph shape and an output query that acts on the matched sub-graph.
Standing Query
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Pattern¶
Cypher from the query pattern is always evaluating the stream of data looking for a match. When matched, it triggers the output query to process the event.
Our standing query is always looking for tainted nodes via the existence of a tainted
parameter.
MATCH
(tainted:account)<-[:from]-(tx:transaction)-[:to]->(otherAccount:account),
(tx)-[:defined_in]->(ba:block_assoc)
WHERE
tainted.tainted IS NOT NULL
RETURN
id(tainted) AS accountId,
tainted.tainted AS oldTaintedLevel,
id(otherAccount) AS otherAccountId
The results of the match pattern are sent to the output query.
The output query acts on the match to propagate the tainted
tag. The value of tainted
is equal to the shortest path to any tainted node.
Output¶
MATCH (tainted), (otherAccount)
WHERE
tainted <> otherAccount
AND id(tainted) = $that.data.accountId
AND id(otherAccount) = $that.data.otherAccountId
WITH *, coll.min([($that.data.oldTaintedLevel + 1), otherAccount.tainted]) AS newTaintedLevel
SET otherAccount.tainted = newTaintedLevel
RETURN
strId(tainted) AS taintedSource,
strId(otherAccount) AS newlyTainted,
newTaintedLevel
A standing query is capable of sending notifications using the andThen
clause in the API.
"andThen": {
"logLevel": "Info",
"logMode": "Complete",
"type": "PrintToStandardOut"
}
In our case, the results from the match are printed to standard out. These are the message that you now see in your terminal window.
2022-04-13 11:05:14,877 Standing query `propagate-tainted` match: {"meta":{"isPositiveMatch":true,"resultId":"e3aa2a7c-b246-4896-b8b7-d4fea9904c91"},"data":{"taintedSource":"ed9899b5-e8a8-3a0b-9785-824f2cb1781b","newlyTainted":"981c7ef9-319a-35ba-90dd-401faf5de6a6","newTaintedLevel":3}}
Tainted Tag Propagation¶
Clear your explorer window using the '<<' button, then run the "Tainted Accounts" query. This query will find the original account or accounts responsible for the taint in the graph.
Right-click on a tainted account (appears fuchsia) and select "Outgoing Tainted Transactions" to find the accounts that this account tainted. Hover over the account to see the tainted=1
property that indicates that this account is one hop away from the source of the taint.
Continue to taint and explore the graph as more of the nodes become tainted.
At any time, you can issue the following query to report the number of tainted nodes in the graph.
MATCH (n)
WHERE n.tainted IS NOT NULL
RETURN DISTINCT n.tainted, count(n)
ORDER BY n.tainted