Skip to content

Ethereum Tag Propagation

Full Recipe

Shared by: Ethan Bell

This recipe models data on the thoroughgoing Ethereum blockchain. Any transaction can be flagged as tainted causing a tainted tag to propagate into the graph to track the flow of transactions from the flagged and tainted accounts.

Ethereum Tag Propagation Recipe
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
version: 1
title: Ethereum Tag Propagation
contributor: https://github.com/emanb29
summary: Ethereum Blockchain model with tag propagation
description: |-
  Models data on the thoroughgoing Ethereum blockchain using tag propagation
  to track the flow of transactions from flagged accounts.

  Newly-mined Ethereum transaction metadata is imported via a Server-Sent Events data
  source. Transactions are grouped by the block in which they were mined then imported
  into the graph. Each wallet address is represented by a node, linked by an edge
  to each transaction sent or received by that account, and linked by an edge to any
  blocks mined by that account. Quick queries allow marking an account as "tainted".
  The tainted flag is propagated along outgoing transaction paths via Standing Queries
  to record the least degree of separation between a tainted source and an account
  receiving a transaction. Canonical (node-provided) capitalization is maintained where
  possible, with `toLower` being used for idFrom-based ID resolution to reflect the
  case-insensitive nature of bytestrings (eg addresses, hashes) used by Ethereum.

  The Ethereum diamond logo is property of the Ethereum Foundation, used under the
  terms of the Creative Commons Attribution 3.0 License.
iconImage: https://i.imgur.com/sSl6BQd.png
ingestStreams:
  - format:
      query: |-
        MATCH (BA), (minerAcc), (blk), (parentBlk)
        WHERE
          id(blk) = idFrom('block', toLower($that.hash))
          AND id(parentBlk) = idFrom('block', toLower($that.parentHash))
          AND id(BA) = idFrom('block_assoc', toLower($that.hash))
          AND id(minerAcc) = idFrom('account', toLower($that.miner))
        CREATE
          (minerAcc)<-[:mined_by]-(blk)-[:header_for]->(BA),
          (blk)-[:preceded_by]->(parentBlk)
        SET
          BA:block_assoc,
          BA.number = $that.number,
          BA.hash = $that.hash,
          blk:block,
          blk = $that,
          minerAcc:account,
          minerAcc.address = $that.miner
      type: CypherJson
    url: https://ethereum.demo.thatdot.com/blocks_head
    type: ServerSentEventsIngest
  - format:
      query: |-
        MATCH (BA), (toAcc), (fromAcc), (tx)
        WHERE
          id(BA) = idFrom('block_assoc', toLower($that.blockHash))
          AND id(toAcc) = idFrom('account', toLower($that.to))
          AND id(fromAcc) = idFrom('account', toLower($that.from))
          AND id(tx) = idFrom('transaction', toLower($that.hash))
        CREATE
          (tx)-[:defined_in]->(BA),
          (tx)-[:from]->(fromAcc),
          (tx)-[:to]->(toAcc)
        SET
          tx:transaction,
          BA:block_assoc,
          toAcc:account,
          fromAcc:account,
          tx = $that,
          fromAcc.address = $that.from,
          toAcc.address = $that.to
      type: CypherJson
    url: https://ethereum.demo.thatdot.com/mined_transactions
    type: ServerSentEventsIngest
standingQueries:
  - pattern:
      query: |-
        MATCH
          (tainted:account)<-[:from]-(tx:transaction)-[:to]->(otherAccount:account),
          (tx)-[:defined_in]->(ba:block_assoc)
        WHERE
          tainted.tainted IS NOT NULL
        RETURN
          id(tainted) AS accountId,
          tainted.tainted AS oldTaintedLevel,
          id(otherAccount) AS otherAccountId
      type: Cypher
      mode: MultipleValues
    outputs:
      propagate-tainted:
        query: |-
          MATCH (tainted), (otherAccount)
          WHERE
            tainted <> otherAccount
            AND id(tainted) = $that.data.accountId
            AND id(otherAccount) = $that.data.otherAccountId
          WITH *, coll.min([($that.data.oldTaintedLevel + 1), otherAccount.tainted]) AS newTaintedLevel
          SET otherAccount.tainted = newTaintedLevel
          RETURN
            strId(tainted) AS taintedSource,
            strId(otherAccount) AS newlyTainted,
            newTaintedLevel
        type: CypherQuery
        andThen:
          type: PrintToStandardOut
nodeAppearances:
  - predicate:
      dbLabel: block
      propertyKeys: [ ]
      knownValues: { }
    icon: cube
    label:
      prefix: 'Block '
      key: number
      type: Property
  - predicate:
      dbLabel: transaction
      propertyKeys: [ ]
      knownValues: { }
    icon: cash
    label:
      prefix: 'Wei Transfer: '
      key: value
      type: Property
  - predicate:
      dbLabel: account
      propertyKeys: [ ]
      knownValues:
        tainted: 0
    icon: social-bitcoin
    label:
      prefix: 'Account '
      key: address
      type: Property
    color: '#fb00ff'
  - predicate:
      dbLabel: account
      propertyKeys:
        - tainted
      knownValues: { }
    icon: social-bitcoin
    label:
      prefix: 'Account '
      key: address
      type: Property
    color: '#c94d44'
  - predicate:
      dbLabel: account
      propertyKeys: [ ]
      knownValues: { }
    icon: social-bitcoin
    label:
      prefix: 'Account '
      key: address
      type: Property
  - predicate:
      dbLabel: block_assoc
      propertyKeys: [ ]
      knownValues: { }
    icon: ios-folder
    label:
      prefix: 'Transactions in block '
      key: number
      type: Property
quickQueries:
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
    quickQuery:
      name: Adjacent Nodes
      querySuffix: MATCH (n)--(m) RETURN DISTINCT m
      queryLanguage: Cypher
      sort: Node
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
      dbLabel: account
    quickQuery:
      name: Outgoing transactions
      querySuffix: MATCH (n)<-[:from]-(tx)-[:to]->(m:account) RETURN m
      edgeLabel: Sent Tx To
      queryLanguage: Cypher
      sort: Node
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
      dbLabel: account
    quickQuery:
      name: Incoming transactions
      querySuffix: MATCH (n)<-[:to]-(tx)-[:from]->(m:account) RETURN m
      edgeLabel: Got Tx From
      queryLanguage: Cypher
      sort: Node
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
    quickQuery:
      name: Refresh
      querySuffix: RETURN n
      queryLanguage: Cypher
      sort: Node
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
      dbLabel: account
    quickQuery:
      name: Mark as tainted and refresh
      querySuffix:
        SET n.tainted = 0
        WITH id(n) AS nId
        CALL { WITH nId
          MATCH (n) WHERE id(n) = nId
          RETURN n
        }
        RETURN n
      queryLanguage: Cypher
      sort: Node
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
      dbLabel: account
    quickQuery:
      name: Incoming tainted transactions
      querySuffix:
        MATCH (n)<-[:to]-(tx)-[:from]->(m:account)
        WHERE m.tainted IS NOT NULL AND m<>n RETURN m
      edgeLabel: Got Tainted From
      queryLanguage: Cypher
      sort: Node
  - predicate:
      propertyKeys: [ ]
      knownValues: { }
    quickQuery:
      name: Local Properties
      querySuffix: RETURN id(n), properties(n)
      queryLanguage: Cypher
      sort: Text
sampleQueries:
  - name: Get a few recently-accessed blocks
    query:
      CALL recentNodes(1000) YIELD node AS nId
      MATCH (n:block)
      WHERE id(n) = nId
      RETURN n
  - name: Find accounts that have both sent and received ETH
    query:
      MATCH (downstream:account)<-[:to]-(tx1)-[:from]->(a:account)<-[:to]-(tx2)-[:from]->(upstream:account)
      WHERE
        tx1<>tx2 AND upstream <> downstream
        AND upstream <> a AND downstream <> a
      RETURN downstream, tx1, a, tx2, upstream LIMIT 1

Download Recipe

Scenario

Newly-mined Ethereum transaction metadata is imported via a Server-Sent Events data source. Transactions are grouped by the block in which they were mined then imported into the graph. Each wallet address is represented by a node, linked by an edge to each transaction sent or received by that account, and linked by an edge to any blocks mined by that account. Quick queries allow marking an account as "tainted". The tainted flag is propagated along outgoing transaction paths via Standing Queries to record the least degree of separation between a tainted source and an account receiving a transaction.

Note

The Ethereum diamond logo is property of the Ethereum Foundation, used under the terms of the Creative Commons Attribution 3.0 License.

Sample Data

Sample data is continuously sampled from the Ethereum block chain and emitted as a server sent event for use in this demo.

How it Works

The recipe installs two ingest queries. They are auto-named INGEST-1 and INGEST-2. The INGEST-1 query processes blocks, and INGEST-2 processes mined transactions. In both queries, idFrom is used to identify nodes from unique identifiers present in the dataset. For accounts, the address is the identifier; for blocks, the block hash is the identifier; etc. Ethereum data uses hexadecimal strings for identifiers, sometimes with a built-in capitalization checksum. This means the address 0x19975E29111a6c85E282eBe409C272c15492c6Ad is the same address as 0x19975e29111a6c85e282ebe409c272c15492c6ad, just written slightly differently. To account for these variations in the hex representation's capitalization, before resolving an id, toLower is used to convert the identifier to consistent lower-case representation.

INGEST-1

The INGEST-1 query processes streaming data for block_head like:

id: 14566607_head
event: block_head
data: {
    "number": 14566607,
    "hash": "0xf3dafdda16a884f6ff2b1b0c0325eaadc70db022363e3af74ab5994f8cbc1f12",
    "parentHash": "0xcd859249e97684f319173c284314307a11deaa2a708c8c5fcf377971e09abb01",
    "sha3Uncles": "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
    "logsBloom": "0x0",
    "transactionsRoot": "0xa77b91fc4ee74bc1df28019e898a4ba17dd87fcc41c633cab25b4909ee56a60a",
    "stateRoot": "0xf7869b706a212bfa504520674c3ef3350b187d31ef207b155fa548a4e59169df",
    "receiptsRoot": "0x6669147c87b5cc857801372bed55ab6ddf3474d935b2b4e3b1ee1b95f4dc357b",
    "miner": "0x829BD824B016326A401d083B33D092293333A830",
    "difficulty": "13384256520560135",
    "extraData": "0xe4b883e5bda9e7a59ee4bb99e9b1bc4a1621",
    "gasLimit": 30029295,
    "gasUsed": 3117128,
    "timestamp": 1649710008,
    "baseFeePerGas": "0xfcf7d67a0",
    "nonce": "0xc1a22f3db05412ca",
    "mixHash": "0xfaafcc9e2be300ba795954bed57a38e415330e6131e48e58770e8e678a16e869"
}

The ingest query identifies (BA), (minerAcc), (blk), and (parentBlk) nodes and loads them into the graph.

    - format:
        query: |-
          MATCH (BA), (minerAcc), (blk), (parentBlk)
          WHERE
            id(blk) = idFrom('block', toLower($that.hash))
            AND id(parentBlk) = idFrom('block', toLower($that.parentHash))
            AND id(BA) = idFrom('block_assoc', toLower($that.hash))
            AND id(minerAcc) = idFrom('account', toLower($that.miner))
          CREATE
            (minerAcc)<-[:mined_by]-(blk)-[:header_for]->(BA),
            (blk)-[:preceded_by]->(parentBlk)
          SET
            BA:block_assoc,
            BA.number = $that.number,
            BA.hash = $that.hash,
            blk:block,
            blk = $that,
            minerAcc:account,
            minerAcc.address = $that.miner
        type: CypherJson
      url: https://ethereum.demo.thatdot.com/blocks_head
      type: ServerSentEventsIngest
POST /api/v1/ingest/INGEST-1
{
  "format": {
    "query": "MATCH (BA), (minerAcc), (blk), (parentBlk)\nWHERE\n  id(blk) = idFrom('block', toLower($that.hash))\n  AND id(parentBlk) = idFrom('block', toLower($that.parentHash))\n  AND id(BA) = idFrom('block_assoc', toLower($that.hash))\n  AND id(minerAcc) = idFrom('account', toLower($that.miner))\nCREATE\n  (minerAcc)<-[:mined_by]-(blk)-[:header_for]->(BA),\n  (blk)-[:preceded_by]->(parentBlk)\nSET\n  BA:block_assoc,\n  BA.number = $that.number,\n  BA.hash = $that.hash,\n  blk:block,\n  blk = $that,\n  minerAcc:account,\n  minerAcc.address = $that.miner",
    "type": "CypherJson"
  },
  "url": "https://ethereum.demo.thatdot.com/blocks_head",
  "type": "ServerSentEventsIngest"
}

INGEST-2

The INGEST-2 query receives tx_mined events like:

id: 14566637: 0
event: tx_mined
data: {
    "blockHash": "0x0d7782556aef00f1391a05a18ab229a70720780fe3c92eaff74738dee59649d0",
    "blockNumber": 14566637,
    "from": "0x19975E29111a6c85E282eBe409C272c15492c6Ad",
    "gas": 42105,
    "gasPrice": "203940950410",
    "hash": "0x470294af9453f2cd1ec084456328da5c613585974e838fa088cef27246b2481e",
    "input": "0x",
    "nonce": 1,
    "r": "0x8b52f40f28db1627e82fea7352f6d2ba1133dcac081b6939bd03ff397370586d",
    "s": "0x8e7b2c69b1684873156090f238d42aad2c14315a08551a57dc5ed1aa45f0a76",
    "to": "0x732Ec041e4Dc8c01B541B237dE5Ce794c51cF838",
    "transactionIndex": 0,
    "type": "0x0",
    "v": "0x26",
    "value": "168930787638413525"
}

The ingest query identifies (BA), (toAcc), (fromAcc), and (tx) and loads them into the graph.

  - format:
      query: |-
        MATCH (BA), (toAcc), (fromAcc), (tx)
        WHERE
          id(BA) = idFrom('block_assoc', toLower($that.blockHash))
          AND id(toAcc) = idFrom('account', toLower($that.to))
          AND id(fromAcc) = idFrom('account', toLower($that.from))
          AND id(tx) = idFrom('transaction', toLower($that.hash))
        CREATE
          (tx)-[:defined_in]->(BA),
          (tx)-[:from]->(fromAcc),
          (tx)-[:to]->(toAcc)
        SET
          tx:transaction,
          BA:block_assoc,
          toAcc:account,
          fromAcc:account,
          tx = $that,
          fromAcc.address = $that.from,
          toAcc.address = $that.to
      type: CypherJson
    url: https://ethereum.demo.thatdot.com/mined_transactions
    type: ServerSentEventsIngest
POST /api/v1/ingest/INGEST-2
{
  "format": {
    "query": "MATCH (BA), (toAcc), (fromAcc), (tx)\nWHERE\n  id(BA) = idFrom('block_assoc', toLower($that.blockHash))\n  AND id(toAcc) = idFrom('account', toLower($that.to))\n  AND id(fromAcc) = idFrom('account', toLower($that.from))\n  AND id(tx) = idFrom('transaction', toLower($that.hash))\nCREATE\n  (tx)-[:defined_in]->(BA),\n  (tx)-[:from]->(fromAcc),\n  (tx)-[:to]->(toAcc)\nSET\n  tx:transaction,\n  BA:block_assoc,\n  toAcc:account,\n  fromAcc:account,\n  tx = $that,\n  fromAcc.address = $that.from,\n  toAcc.address = $that.to",
    "type": "CypherJson"
  },
  "url": "https://ethereum.demo.thatdot.com/mined_transactions",
  "type": "ServerSentEventsIngest"
}

Running the Recipe

 java -jar quine-1.7.2.jar -r ethereum.yaml
Graph is ready
Running Recipe: Ethereum Tag Propagation
Using 6 node appearances
Using 7 quick queries
Using 2 sample queries
Running Standing Query STANDING-1
Running Ingest Stream INGEST-1
Running Ingest Stream INGEST-2
Quine web server available at http://localhost:8080 

Observe that Quine is running in the terminal window and that the ingest queries are receiving data.

 | => STANDING-1 count 0
 | => INGEST-1 status is running and ingested 485
 | => INGEST-2 status is running and ingested 34820

Reviewing chains

The nodes appearing in your graph are from the live Ethereum blockchain. They will continue to stream in as long as Quine is running the recipe.

Start exploring the graph by pulling a few recent blocks from the blockchain with the Recently Accessed Blocks sample query. Select the sample query in the query bar then click the Query button. The query returns a sub-graph of the recent blocks ordered by the block that preceded it.

Note

Click on the query bar for a list of sample queries.

Recently Accessed Blocks

Take a moment to inspect a couple of the blocks to see the data stored as parameters.

Blocks from the Ethereum Blockchain

Click back into the query bar and clear the query then submit the Sent and Received ETH sample query to see accounts that have sent and received transactions.

Blocks that have sent and received Wei

This query finds a series of Wei transactions chained from account to account. Arrange the graph so that you can see all of the nodes. Right-click on the node at the head of the chain and select "Outgoing Transactions" to create a synthetic edge between the accounts. Create a second synthetic edge between the second and third accounts.

Tip

Hold shift while moving a node to lock it's position in place.

Taint a Node

Right-click on the origin node again and select "Mark as Tainted." This adds a tainted parameter tag to the node and sets it to a value of 0. A node with tainted=0 indicates that this is the source of taint in our graph.

Notice that you begin to receive updates in the terminal window where you launched Quine from. The Standing Query produces these notices from the recipe; let's look at it now.

A Standing Query is composed of two parts, the pattern query that detects a sub-graph shape and an output query that acts on the matched sub-graph.

Standing Query
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    type: ServerSentEventsIngest
standingQueries:
  - pattern:
      query: |-
        MATCH
          (tainted:account)<-[:from]-(tx:transaction)-[:to]->(otherAccount:account),
          (tx)-[:defined_in]->(ba:block_assoc)
        WHERE
          tainted.tainted IS NOT NULL
        RETURN
          id(tainted) AS accountId,
          tainted.tainted AS oldTaintedLevel,
          id(otherAccount) AS otherAccountId
      type: Cypher
      mode: MultipleValues
    outputs:
      propagate-tainted:
        query: |-
          MATCH (tainted), (otherAccount)
          WHERE
            tainted <> otherAccount
            AND id(tainted) = $that.data.accountId
            AND id(otherAccount) = $that.data.otherAccountId
          WITH *, coll.min([($that.data.oldTaintedLevel + 1), otherAccount.tainted]) AS newTaintedLevel
          SET otherAccount.tainted = newTaintedLevel
          RETURN
            strId(tainted) AS taintedSource,
            strId(otherAccount) AS newlyTainted,
            newTaintedLevel
        type: CypherQuery
        andThen:

Pattern

Cypher from the query pattern is always evaluating the stream of data looking for a match. When matched, it triggers the output query to process the event.

Our standing query is always looking for tainted nodes via the existence of a tainted parameter.

MATCH
  (tainted:account)<-[:from]-(tx:transaction)-[:to]->(otherAccount:account),
  (tx)-[:defined_in]->(ba:block_assoc)
WHERE
  tainted.tainted IS NOT NULL
RETURN
  id(tainted) AS accountId,
  tainted.tainted AS oldTaintedLevel,
  id(otherAccount) AS otherAccountId

The results of the match pattern are sent to the output query.

The output query acts on the match to propagate the tainted tag. The value of tainted is equal to the shortest path to any tainted node.

Output

MATCH (tainted), (otherAccount)
WHERE
  tainted <> otherAccount
  AND id(tainted) = $that.data.accountId
  AND id(otherAccount) = $that.data.otherAccountId
WITH *, coll.min([($that.data.oldTaintedLevel + 1), otherAccount.tainted]) AS newTaintedLevel
SET otherAccount.tainted = newTaintedLevel
RETURN
  strId(tainted) AS taintedSource,
  strId(otherAccount) AS newlyTainted,
  newTaintedLevel

A standing query is capable of sending notifications using the andThen clause in the API.

  "andThen": {
    "logLevel": "Info",
    "logMode": "Complete",
    "type": "PrintToStandardOut"
}

In our case, the results from the match are printed to standard out. These are the message that you now see in your terminal window.

2022-04-13 11:05:14,877 Standing query `propagate-tainted` match: {"meta":{"isPositiveMatch":true,"resultId":"e3aa2a7c-b246-4896-b8b7-d4fea9904c91"},"data":{"taintedSource":"ed9899b5-e8a8-3a0b-9785-824f2cb1781b","newlyTainted":"981c7ef9-319a-35ba-90dd-401faf5de6a6","newTaintedLevel":3}}

Tainted Tag Propagation

Clear your explorer window using the '<<' button, then run the "Tainted Accounts" query. This query will find the original account or accounts responsible for the taint in the graph.

Right-click on a tainted account (appears fuchsia) and select "Outgoing Tainted Transactions" to find the accounts that this account tainted. Hover over the account to see the tainted=1 property that indicates that this account is one hop away from the source of the taint.

Tainted Node

Continue to taint and explore the graph as more of the nodes become tainted.

At any time, you can issue the following query to report the number of tainted nodes in the graph.

MATCH (n) 
WHERE n.tainted IS NOT NULL 
RETURN DISTINCT n.tainted, count(n) 
ORDER BY n.tainted