Apache Log Analysis
Full Recipe¶
Shared by: Josh Cody
A simple recipe that parses incoming text for each line of an Apache web server access log and structures it into a graph. A useful introduction to standing queries, the powerful feature that makes Quine unique.
Apache Log Analysis Recipe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Scenario¶
This recipe loads a sample Apache log file and manifests disconnected nodes in a graph for basic analysis and metric reporting.
Sample Data¶
Note
Download the sample data to the same directory where you will run Quine.
A sample Apache web server access logs dataset downloaded from https://recipes.quine.io/sample_apache_logs
or using the command below.
curl -L https://recipes.quine.io/sample_apache_logs -o apache.log
How it Works¶
The recipe reads log entries from the sample data files using an ingest stream to manifest a graph in Quine. A regular expression inside the ingest stream Cypher query parses the logline and populates parameters in the node.
INGEST-1 processes the apache.log
file:
- type: FileIngest
path: $in_file
format:
type: CypherLine
query: |-
WITH text.regexFirstMatch($that, '(\\S+)\\s+\\S+\\s+(\\S+)\\s+\\[(.+)\\]\\s+\"(.*)\\s+(.*)\\s+(.*)\"\\s+([0-9]+)\\s+(\\S+)\\s+\"(.*)\"\\s+\"(.*)\"')
AS r
CREATE ({
sourceIp: r[1],
user: r[2],
time: datetime(r[3], 'dd/MMM/yyyy:HH:mm:ss Z'),
verb: r[4],
path: r[5],
httpVersion: r[6],
status: r[7],
size: r[8],
referrer: r[9],
agent: r[10],
type: 'log'
})
[
{
"type": "FileIngest",
"path": "$in_file",
"format": {
"type": "CypherLine",
"query": "WITH text.regexFirstMatch($that, '(\\\\S+)\\\\s+\\\\S+\\\\s+(\\\\S+)\\\\s+\\\\[(.+)\\\\]\\\\s+\\\"(.*)\\\\s+(.*)\\\\s+(.*)\\\"\\\\s+([0-9]+)\\\\s+(\\\\S+)\\\\s+\\\"(.*)\\\"\\\\s+\\\"(.*)\\\"')\nAS r\nCREATE ({\n sourceIp: r[1],\n user: r[2],\n time: datetime(r[3], 'dd/MMM/yyyy:HH:mm:ss Z'),\n verb: r[4],\n path: r[5],\n httpVersion: r[6],\n status: r[7],\n size: r[8],\n referrer: r[9],\n agent: r[10],\n type: 'log'\n})"
}
}
]
A standing query is configured to detect nodes that have a type
of log
and then create relationships between the nodes and their verbs.
- pattern:
type: Cypher
query: MATCH (l) WHERE l.type = 'log' RETURN DISTINCT id(l) AS id
mode: DistinctId
outputs:
verb:
type: CypherQuery
query: |-
MATCH (l) WHERE id(l) = $that.data.id
MATCH (v) WHERE id(v) = idFrom('verb', l.verb)
SET v.type = 'verb',
v.verb = l.verb
CREATE (l)-[:verb]->(v)
[
{
"pattern": {
"type": "Cypher",
"query": "MATCH (l) WHERE l.type = 'log' RETURN DISTINCT id(l) AS id",
"mode": "DistinctId"
},
"outputs": {
"verb": {
"type": "CypherQuery",
"query": "MATCH (l) WHERE id(l) = $that.data.id\nMATCH (v) WHERE id(v) = idFrom('verb', l.verb)\nSET v.type = 'verb',\n v.verb = l.verb\nCREATE (l)-[:verb]->(v)"
}
}
}
]
Running the Recipe¶
❯ java -jar quine-1.7.3.jar -r apache_log.yaml --recipe-value in_file=apache.log
Graph is ready
Running Recipe: Apache Log Analytics
Using 1 sample queries
Running Standing Query STANDING-1
Running Ingest Stream INGEST-1
Status query URL is http://localhost:8080#MATCH%20%28l%29%2D%5Brel%3Averb%5D%2D%3E%28v%29%20WHERE%20l%2Etype%20%3D%20%27log%27%20AND%20v%2Etype%20%3D%20%27verb%27%20AND%20v%2Everb%20%3D%20%27GET%27%20RETURN%20count%28rel%29%20AS%20get%5Fcount
Quine web server available at http://localhost:8080
INGEST-1 status is completed and ingested 10000
Summary¶
The recipe contains a status query that will emit a link in the console window to view the results for the count of GET
requests in the log file. The status query will also update the console with a running results count of GET
entries as they are encountered.
Status query URL is http://localhost:8080#MATCH%20%28l%29%2D%5Brel%3Averb%5D%2D%3E%28v%29%20WHERE%20l%2Etype%20%3D%20%27log%27%20AND%20v%2Etype%20%3D%20%27verb%27%20AND%20v%2Everb%20%3D%20%27GET%27%20RETURN%20count%28rel%29%20AS%20get%5Fcount
At the time of writing this recipe, the sample data file contains 9951 GET
requests.
---[ Status Query result 1 ]------------
get_count | 9951 count 10000
----------+-----------------------------
Tip
Quick Queries are available by right clicking on a node.
Quick Query | Node Type | Description |
---|---|---|
Adjacent Nodes | All | Display the nodes that are adjacent to this node. |
Refresh | All | Refresh the content stored in a node |
Local Properties | All | Display the properties stored by the node |
Build your skills¶
What Cypher query could you write to return a count for other HTTP verbs in the log file?
Solution
We solved this by modifying the status query to be less specific and to return unique node verb parameters as part of the results.
Enter this query into the Exploration UI and hit Shift+Enter.
MATCH (l)-[rel:verb]->(v)
WHERE l.type = 'log' AND v.type = 'verb'
RETURN DISTINCT v.verb, count(rel)
Our results:
v.verb | count(rel) |
---|---|
"OPTIONS" | 1 |
"POST" | 5 |
"HEAD" | 42 |
"GET" | 9951 |