Skip to content

Apache Log Analysis

Full Recipe

Shared by: Josh Cody

A simple recipe that parses incoming text for each line of an Apache web server access log and structures it into a graph. A useful introduction to standing queries, the powerful feature that makes Quine unique.

Apache Log Analysis Recipe
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
version: 1
title: Apache Log Analytics
contributor: https://github.com/joshcody
summary: ''
description: ''
ingestStreams:
  - type: FileIngest
    path: $in_file
    format:
      type: CypherLine
      query: |-
        WITH text.regexFirstMatch($that, '(\\S+)\\s+\\S+\\s+(\\S+)\\s+\\[(.+)\\]\\s+\"(.*)\\s+(.*)\\s+(.*)\"\\s+([0-9]+)\\s+(\\S+)\\s+\"(.*)\"\\s+\"(.*)\"')
        AS r
        CREATE ({
          sourceIp: r[1],
          user: r[2],
          time: datetime(r[3], 'dd/MMM/yyyy:HH:mm:ss Z'),
          verb: r[4],
          path: r[5],
          httpVersion: r[6],
          status: r[7],
          size: r[8],
          referrer: r[9],
          agent: r[10],
          type: 'log'
        })
standingQueries:
  - pattern:
      type: Cypher
      query: MATCH (l) WHERE l.type = 'log' RETURN DISTINCT id(l) AS id
      mode: DistinctId
    outputs:
      verb:
        type: CypherQuery
        query: |-
          MATCH (l) WHERE id(l) = $that.data.id
          MATCH (v) WHERE id(v) = idFrom('verb', l.verb)
          SET v.type = 'verb',
            v.verb = l.verb
          CREATE (l)-[:verb]->(v)
nodeAppearances: [ ]
quickQueries: [ ]
sampleQueries:
  - name: Count HTTP GET Requests
    query: >-
      MATCH (l)-[rel:verb]->(v)
      WHERE l.type = 'log' AND v.type = 'verb' AND v.verb = 'GET'
      RETURN count(rel) AS get_count
statusQuery:
  cypherQuery: >-
    MATCH (l)-[rel:verb]->(v)
    WHERE l.type = 'log' AND v.type = 'verb' AND v.verb = 'GET'
    RETURN count(rel) AS get_count

Download Recipe

Scenario

This recipe loads a sample Apache log file and manifests disconnected nodes in a graph for basic analysis and metric reporting.

Sample Data

Note

Download the sample data to the same directory where you will run Quine.

A sample Apache web server access logs dataset downloaded from https://recipes.quine.io/sample_apache_logs or using the command below.

curl -L https://recipes.quine.io/sample_apache_logs -o apache.log

How it Works

The recipe reads log entries from the sample data files using an ingest stream to manifest a graph in Quine. A regular expression inside the ingest stream Cypher query parses the logline and populates parameters in the node.

INGEST-1 processes the apache.log file:

  - type: FileIngest
    path: $in_file
    format:
      type: CypherLine
      query: |-
        WITH text.regexFirstMatch($that, '(\\S+)\\s+\\S+\\s+(\\S+)\\s+\\[(.+)\\]\\s+\"(.*)\\s+(.*)\\s+(.*)\"\\s+([0-9]+)\\s+(\\S+)\\s+\"(.*)\"\\s+\"(.*)\"')
        AS r
        CREATE ({
          sourceIp: r[1],
          user: r[2],
          time: datetime(r[3], 'dd/MMM/yyyy:HH:mm:ss Z'),
          verb: r[4],
          path: r[5],
          httpVersion: r[6],
          status: r[7],
          size: r[8],
          referrer: r[9],
          agent: r[10],
          type: 'log'
        })
POST /api/v1/ingest/INGEST-1
[
  {
    "type": "FileIngest",
    "path": "$in_file",
    "format": {
      "type": "CypherLine",
      "query": "WITH text.regexFirstMatch($that, '(\\\\S+)\\\\s+\\\\S+\\\\s+(\\\\S+)\\\\s+\\\\[(.+)\\\\]\\\\s+\\\"(.*)\\\\s+(.*)\\\\s+(.*)\\\"\\\\s+([0-9]+)\\\\s+(\\\\S+)\\\\s+\\\"(.*)\\\"\\\\s+\\\"(.*)\\\"')\nAS r\nCREATE ({\n  sourceIp: r[1],\n  user: r[2],\n  time: datetime(r[3], 'dd/MMM/yyyy:HH:mm:ss Z'),\n  verb: r[4],\n  path: r[5],\n  httpVersion: r[6],\n  status: r[7],\n  size: r[8],\n  referrer: r[9],\n  agent: r[10],\n  type: 'log'\n})"
    }
  }
]

A standing query is configured to detect nodes that have a type of log and then create relationships between the nodes and their verbs.

  - pattern:
      type: Cypher
      query: MATCH (l) WHERE l.type = 'log' RETURN DISTINCT id(l) AS id
      mode: DistinctId
    outputs:
      verb:
        type: CypherQuery
        query: |-
          MATCH (l) WHERE id(l) = $that.data.id
          MATCH (v) WHERE id(v) = idFrom('verb', l.verb)
          SET v.type = 'verb',
            v.verb = l.verb
          CREATE (l)-[:verb]->(v)
/api/v1/query/standing/STANDING-1
  [
    {
      "pattern": {
        "type": "Cypher",
        "query": "MATCH (l) WHERE l.type = 'log' RETURN DISTINCT id(l) AS id",
        "mode": "DistinctId"
      },
      "outputs": {
        "verb": {
          "type": "CypherQuery",
          "query": "MATCH (l) WHERE id(l) = $that.data.id\nMATCH (v) WHERE id(v) = idFrom('verb', l.verb)\nSET v.type = 'verb',\n  v.verb = l.verb\nCREATE (l)-[:verb]->(v)"
        }
      }
    }
  ]

Running the Recipe

 java -jar quine-1.7.3.jar -r apache_log.yaml --recipe-value in_file=apache.log
Graph is ready
Running Recipe: Apache Log Analytics
Using 1 sample queries
Running Standing Query STANDING-1
Running Ingest Stream INGEST-1
Status query URL is http://localhost:8080#MATCH%20%28l%29%2D%5Brel%3Averb%5D%2D%3E%28v%29%20WHERE%20l%2Etype%20%3D%20%27log%27%20AND%20v%2Etype%20%3D%20%27verb%27%20AND%20v%2Everb%20%3D%20%27GET%27%20RETURN%20count%28rel%29%20AS%20get%5Fcount
Quine web server available at http://localhost:8080
INGEST-1 status is completed and ingested 10000 

Summary

The recipe contains a status query that will emit a link in the console window to view the results for the count of GET requests in the log file. The status query will also update the console with a running results count of GET entries as they are encountered.

Status query URL is http://localhost:8080#MATCH%20%28l%29%2D%5Brel%3Averb%5D%2D%3E%28v%29%20WHERE%20l%2Etype%20%3D%20%27log%27%20AND%20v%2Etype%20%3D%20%27verb%27%20AND%20v%2Everb%20%3D%20%27GET%27%20RETURN%20count%28rel%29%20AS%20get%5Fcount

At the time of writing this recipe, the sample data file contains 9951 GET requests.

---[ Status Query result 1 ]------------
get_count | 9951 count 10000
----------+-----------------------------

Tip

Quick Queries are available by right clicking on a node.

Quick Query Node Type Description
Adjacent Nodes All Display the nodes that are adjacent to this node.
Refresh All Refresh the content stored in a node
Local Properties All Display the properties stored by the node

Build your skills

What Cypher query could you write to return a count for other HTTP verbs in the log file?

Solution

We solved this by modifying the status query to be less specific and to return unique node verb parameters as part of the results.

Enter this query into the Exploration UI and hit Shift+Enter.

MATCH (l)-[rel:verb]->(v) 
WHERE l.type = 'log' AND v.type = 'verb' 
RETURN DISTINCT v.verb, count(rel)

Our results:

v.verb count(rel)
"OPTIONS" 1
"POST" 5
"HEAD" 42
"GET" 9951