A Log Analyzer with the ELK Stack, Nginx and Docker

This post is about a log parser I quickly put together, with the help of the Elastic Search – Logstash – Kibana (ELK) stack and Docker.

The ELK Stack, in undoubtedly a phenomenal solution for analyzing centralized logging. Logstash beautifully breaks down every detail in the log and takes care of pushing them into ElasticSearch, which is then displayed by Kibana as neat charts.

In this case, we configure Logstash to additionally generate geographic data, based on an IP address we have. We also have an Nginx instance, which then forwards incoming requests to Kibana.

Like I said, we use Docker for the entire setup, and so, the only thing installed on the host machine is Docker. Every other environment is spun up on demand, using configurations that can be persisted on the disk. Persisted on the disk is particularly important, because with Docker and modern day applications, an important benefit is that we can persist environments as configurations. Gone are the days when we have had to create environments and template them as VMWare / VBox images. Docker allows you to script every single aspect of an environment into files that can be checked-in into a source code repository such as GIT.

We start off by having Docker and Docker-Compose installed on the host machine. My go-to website to find server how-to’s has got to be DigitalOcean. The documentation they have is just unparalleled in my opinion. Here are two how-to’s on installing docker and docker-compose.

We start by creating a docker-compose.yml file that has the necessary information to bring up our environment.

docker-compose.yml

version: '3.2'

services:

    nginx:
        image: nginx
        container_name: container_nginx
        volumes:
        - ./nginx.conf:/etc/nginx/nginx.conf
        ports:
        - "80:80"

    elasticsearch:
        labels:
            com.example.service: "elasticsearch"
            com.example.description: "For searching and indexing data"
        image: elasticsearch:6.6.1
        container_name: container_es
        volumes:
        - esdata:/usr/share/elasticsearch/data/
    
    kibana:
        labels:
            com.example.service: "kibana"
            com.example.description: "Data visualisation and for log aggregation"
        image: kibana:6.6.1
        container_name: container_kibana
        environment:
        - ELASTICSEARCH_URL=http://elasticsearch:9200
        depends_on:
        - elasticsearch
    
    logstash:
        labels:
            com.example.service: "logstash"
            com.example.description: "For logging data"
        image: logstash:6.6.1
        container_name: container_logstash
        volumes:
        - ./logs:/logstash_dir
        - ./logstash.conf:/data/logstash.conf
        command: logstash -f /data/logstash.conf
        depends_on:
        - elasticsearch

volumes:
    esdata:

To have a brief walk-through of the file, we have four services listed – nginx, elasticsearch, logstash and kibana.

The nginx service uses the nginx:latest image from Docker Hub. The service also creates a volume (mount) for a configuration file which ensures Nginx acts as a proxy to our kibana service. The Docker service also exposes port 80 to port 80 on the host machine.

nginx.conf

events {

}

http {
  server {
    listen 80;
      server_name kibana;

    # Expose via volumes if required
    error_log   /var/log/nginx/kibana.error.log;
    access_log  /var/log/nginx/kibana.access.log;

    location / {
      rewrite ^/(.*) /$1 break;
      proxy_ignore_client_abort on;
      proxy_pass "http://kibana:5601/app/kibana";
      proxy_set_header  X-Real-IP  $remote_addr;
      proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header  Host $http_host;
    
    }
  }
}

elasticsearch is our next service. The service has a few labels that define its purpose. The image is defined to be elasticsearch:6.6.1 on the Docker Hub. Like nginx, and our other services, the image would be pulled from Docker Hub at on first bootup. The service defines a named volume called esdata. This is to ensure that the data we have in ElasticSearch is persisted on the host machine. k

kibana our next service, in addition to the properties we have already seen, defines an environment variable that points to the elasticsearch service we already defined. A reference can be maintained across services, with the service name duplicating as a domain name within a virtual network created by Docker. Since requests are not expected to be directly consumed by Kibana, we expose no ports. The service finally has a depends_on declaration which tells Docker to bring up the dependency service, elasticsearch in this case, before spinning up Kibana.

logstash, our final service, has a couple of volumes mounted. One is a directory in which our .log files exist – within a sub-directory, logs. A configuration file is also mounted. The service also has an instruction that needs to be executed once the container is up. For the sake of completeness, a container can be said to be the executable representation of an image. Here is the configuration file mounted,

logstash.conf


input {
  file {
    path =&gt; "/logstash_dir/*.log"
    start_position =&gt; "beginning"
    sincedb_path =&gt; "/dev/null"
  }
}

filter {

  # Extract log file contents into variables
  dissect {
    mapping =&gt; { "message" =&gt; "%{ip} - - [%{ts} %{+ts}] %{HTTPMethod} %{targetDownload} %{HTTPProtocol} %{HTTPStatus}" }
  }
  
  # Update elasticsearch import data with the actual download date
  date {
	match =&gt; [ "ts", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  
  # Get geo coordinates based on the IP address
  geoip {
	source =&gt; [ "ip" ]
    add_tag =&gt; [ "IP_Geo_Decoded"]
  }
  
  # Identify filename and filetype
  ruby {
	code =&gt; "event.set('targetDownloadFilename', event.get('targetDownload').split('/').last); event.set('targetDownloadFiletype', event.get('targetDownloadFilename').split('.').last)"
  }
  
  # Add fingerprinting to avoid duplicates
  fingerprint {
    source =&gt; "message"
    target =&gt; "[@metadata][fingerprint]"
    method =&gt; "MURMUR3"
  }
}

output {
  elasticsearch {
    hosts =&gt; ["elasticsearch:9200"]
    index =&gt; "logs-%{+yyyy-MM-dd}"
    document_type =&gt; "downloadlogs"
    document_id =&gt; "%{[@metadata][fingerprint]}"
  }
}

Let’s look at our logstash.conf in a little more detail. The type of logs being analyzed is a little more clear here. We are looking a download logs for various files. The configuration file primarily defines three sections – input, filter and output. The input section defines the source, which is the container local directory we have volume mounted the log files into – /logstash_dir. The output section on the other hand defines the destination to post our logstash parsed data – in this case, its our elasticsearch service.

The filter section within logstash.conf is a little more interesting. It has five sub-sections. The first, dissect, defines how each log entry must be broken down into variables that are eventually pushed to elasticsearch. This is done by defining variables inline, while maintaining the pattern of the log entry. The second section, date, ensures ElasticSearch considers the actual file download date in all queries, as opposed to the data insertion (from Logstash to ElasticSearch) time. The third, geoip, decodes the IP Address listed in the log file, down to the level of a Country / City name. It also gives us lat/lng coordinates among other things. The fourth, ruby, defines a simple ruby script to further break down a variable into multiple other variables. In this case, we are breaking down the targetDownload variable, which evidently contains the full path to the download URI, to two more variables that contain the file name and file type. The final section, fingerprint, defines a algorithm that could be used to hash your data against a source, so that duplicate entries are avoided during insertion. You could modify these filter sub-sections to match the data on your log files.

We now have all the scripts required for booting up our environment, and one command to do that, must import all our logs on bringing up the enviroment – remember the command that our Logstash container executes on startup?

Let’s execute. Run,

docker-compose up -d

If everything was setup right, after a few minutes, you must have Kibana come up on port 80 of the host machine – with the data in your logs!

To collect new logs, you would just need to restart the Logstash service. New data would be seen in Kibana after a few minutes.

docker-compose restart logstash

Time to get going with custom dashboards / charts on Kibana!
Viel Glück! 🙂

By Immanuel Noel

Leave a Reply Cancel reply