Kim Rudolph

Centralized Logging with the ELK Stack

This tutorial will guide through the steps to setup a elasticsearch - logstash - kibana (ELK) stack. Each software component is separated in its own single docker container to be able to replace single components easily. All test instances are running on the same host. That host also acts as a log-producing client pushing log events to the ELK stack using a logstash-forwarder instance.

A shorter and faster way to test the stack are several docker images that contain the full ELK stack. Or other tutorials describing an ELK installation without docker.

ELK alternatives are Splunk as a paid only version or Graylog as another OSS option.

Search Backend Elasticsearch

Data Container

First goes a simple esdata data container for the elasticsearch data. That provides the option to easily start over by destroying that container or reusing a snapshot of the data.

$ docker run -d -v /data --name esdata ubuntu

Configuration

Elasticsearch needs a minimal configuration to know where to store its data and logs. Later on, the kibana frontend needs to communicate with the elasticseatch API. Configuring a wildcard CORS policy prevents any access issues which might occure during this test setup.

~/elk/config/elasticsearch.yml

path:
  data: /data/data
  logs: /data/log
  plugins: /data/plugins
  work: /data/work
http.cors.allow-origin: "/.*/"
http.cors.enabled: true

A basic log configuration file should be created to be able to check elasticsearch logs on any errors. Throwing all data and logs into a single container might not be the best idea for a production environment.

~/elk/config/logging.yml

es.logger.level: INFO
rootLogger: ${es.logger.level}, file
logger:
  action: DEBUG
appender:
  file:
    type: dailyRollingFile
    file: ${path.logs}/${cluster.name}.log
    datePattern: "'.'yyyy-MM-dd"
    layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

Create Container

Finally an instance of the official elasticsearch image is started.

$ docker run -d --volumes-from esdata -v ~/elk/config:/usr/share/elasticsearch/config -p 9200:9200 -p 9300:9300 --name elasticsearch elasticsearch

Health Check and Logging

If the elasticsearch instance started successfully, the status can be retrieved via the _cluster API endpoint. After the first shard was added, the cluster health status switches from green to yellow as there is only one master node running.

$ curl localhost:9200/_cluster/health?pretty

first shard added

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 1,
  "active_shards" : 1,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1
}

As configured, the elasticsearch logs are stored in the esdata container and can be tailed using a simple volume mount.

$ docker run --volumes-from esdata ubuntu tail -f /data/log/elasticsearch.log

Search Dashboard Kibana

Kibana is a frontend for data visualization using the elasticsearch REST API. The only configuration needed is the elasticsearch endpoint.

~/elk/kibana/Dockerfile

FROM ubuntu

RUN apt-get update

RUN apt-get install -y curl

RUN curl -s https://download.elasticsearch.org/kibana/kibana/kibana-4.0.0-linux-x64.tar.gz | tar xz

RUN mv kibana-4.0.0-linux-x64 /kibana

EXPOSE 5601

CMD sed -i 's|^elasticsearch_url:.*$|elasticsearch_url: '"\"${ELASTICSEARCH_URL}\""'|' /kibana/config/kibana.yml && /kibana/bin/kibana
$ docker build -t kibana .

Kibana stores all frontend configurations in an .kibana index in elasticsearch. As long as the esdata container is not destroyed, the kibana instance can be destroyed, updated or restarted without data loss.

$ docker run -d -p 80:5601 --link elasticsearch:elasticsearch -e ELASTICSEARCH_URL=http://elasticsearch:9200 --name kibana kibana

Log Processing with Logstash

Logstash is the processing part of the ELK stack. It receives log messages, filters them, defines relevant fields and then stores the results in an elasticsearch index.

SSL Key Creation

The communication between clients pushing log events and logstash receiving log events is secured by TLS. A small go script taken from issue logstash-forwarder #221 helps setting the certificate server IP address for the created certificate.

$ curl https://raw.githubusercontent.com/driskell/log-courier/develop/src/lc-tlscert/lc-tlscert.go -o ~/elk/lc-tlscert.go

The logstash test setup uses logstash as DNS name. The valid Number of days should be set to something more than a year. If that time passes, the certificate has to be regerenerated and distributed to every host which produces and pushes log files.

~/elk

$ go build lc-tlscert.go
$ ./lc-tlscert
...
Common name: logstash
...
DNS or IP address 1: logstash
DNS or IP address 2: 
...
Number of days: 3650
...

Renaming the created certificate files is only for reasons of simplicity.

~/elk

$ cp selfsigned.crt ssl/logstash.crt
$ cp selfsigned.key ssl/logstash.key

Configuration

The logstash configuration has three parts: input, filter and output. lumberjack is a network protocol, that enables logstash to receive log message as events. Only a port and the path to the created certificate is needed to get started.

~/elk/config/logstash.conf

input {
  lumberjack {
    port => 5000
    ssl_certificate => "/ssl/logstash.crt"
    ssl_key => "/ssl/logstash.key"
  }
}
...

Java applications often produce log messages like 2015-01-01 12:13:14,995 [main] INFO de.stuff.StuffHandling - stuff happened. A grok filter can be used to extract relevant fields from that message. The grokdebug website is a great help to find matching patterns. Processed logfiles should use the actual log timestamp instead of the logstash timestamp when the entry was processed. That enables time independend log file processing without altered timestamp identifiers. The date filter replaces the automatically generated @timestamp field with the date field from the log message. There are a lot of other filters etc. to extract and modify the log messages, but in this test case this simple configuration should be sufficient.

~/elk/config/logstash.conf

...
filter {
  grok {
    match => [ "message", "%{TIMESTAMP_ISO8601:date} %{SYSLOG5424SD:thread} %{WORD:logelevel} %{JAVACLASS:class} - %{GREEDYDATA:message}" ]
    overwrite => [ "message" ]
  }
  date {
    locale => "de"
    match => ["date", "YYYY-MM-dd HH:mm:ss,SSS"]
    timezone => "Europe/Berlin"
    target => "@timestamp"
  }
}
...

Finally the resulting data has to be send to the elasticsearch server. The host parameter is set to elasticsearch, because it is possible to simply link the elasticsearch and logstash instances in this local test setup.

~/elk/config/logstash.conf

...
output {
  elasticsearch { 
    host => "elasticsearch"
    protocol => http
  }
}

Build Image

~/elk/logstash/Dockerfile

FROM java:8-jre

RUN curl -s https://download.elasticsearch.org/logstash/logstash/logstash-1.4.2.tar.gz | tar xz

RUN mv logstash-1.4.2 logstash

VOLUME /config
VOLUME /log
VOLUME /ssl

EXPOSE 5000

CMD /logstash/bin/logstash agent --config /config/logstash.conf --log /log/logstash.log
$ docker build -t logstash .

Create Container

The logstash instance does not store any data and can be restarted with docker restart logstash if the configuration changes. If any new index fields are introduced, the kibana index has to be refreshed.

$ docker run -d --link elasticsearch:elasticsearch -p 5000:5000 -v ~/elk/config:/config -v ~/elk/log:/log -v ~/elk/ssl:/ssl --name logstash logstash

Debugging Logstash Filters

For logstash filter debugging purposes, a simpler logstash configuration can be used. The test input can be piped to a minimal container instance which will immmediately be destroyed afterwards. The elasticsearch output definition is optional.

~/elk/test/logstash.conf

input {
  stdin {}
}
filter {
  # put the filters here
}
output {
  stdout { codec => "rubydebug" }
  elasticsearch {
    host => "elasticsearch"
    protocol => http
  }
}
$ cat logfile.log | docker run --rm -i --link elasticsearch:elasticsearch -v ~/elk/test:/config logstash
$ echo "test" | docker run --rm -i --link elasticsearch:elasticsearch -v ~/elk/test:/config logstash

The example log message and the corresponding grok filter produces the following fields.

{
       "message" => "stuff happened",
      "@version" => "1",
    "@timestamp" => "2015-01-01T11:13:14.995Z",
          "host" => "381bbee0eafc",
          "date" => "2015-01-01 12:13:14,995",
        "thread" => "[main]",
     "logelevel" => "INFO",
         "class" => "de.stuff.StuffHandling"
}

Pushing Log Events with logstash-forwarder

At this point the ELK stack provides and enpoint to push log events to, stores filtered data in an index and also provides a frontend to access that data. The last piece is the logstash-forwarder tool to watch logfiles and push any changes to the logstash instance.

Build Image

The installation procedure needs pleaserun which itself needs Ruby 2.0.0 through a dependency on the mustache gem. Therefore ubuntu:14.10 and not ubuntu:latest (pointing to ubuntu:14.04 and only providing Ruby 1.9.3) is used as a base image.

~/elk/logstash-forwarder/Dockerfile

FROM ubuntu:14.10

RUN apt-get update

RUN apt-get install -y wget git golang ruby ruby-dev irb ri rdoc build-essential libssl-dev zlib1g-dev

RUN gem install fpm pleaserun

RUN git clone git://github.com/elasticsearch/logstash-forwarder.git /tmp/logstash-forwarder

RUN cd /tmp/logstash-forwarder && go build

RUN cd /tmp/logstash-forwarder && make deb

RUN dpkg -i /tmp/logstash-forwarder/*.deb

RUN rm -rf /tmp/*

VOLUME /config
VOLUME /log
VOLUME /ssl

CMD /opt/logstash-forwarder/bin/logstash-forwarder -config /config/config.json
$ docker build -t logstash-forwarder .

Configuration

There are two parts that have to be configured: the network part describes where to push the logs to and the files part points to the log files that should be watched. In this basic example any files matching *.log in the mounted /log path are watched.

~/elk/config/config.json

{
  "network": {
    "servers": [ "logstash:5000" ],
    "ssl certificate": "/ssl/logstash.crt",
    "ssl key": "/ssl/logstash.key",
    "ssl ca": "/ssl/logstash.crt",
    "timeout": 15
  },
  "files": [
    {
      "paths": [ "/log/*.log" ],
      "fields": { "type": "logs" }
    }
  ]
}

Test log entries should follow the configured pattern.

~/elk/logfiles/example.log

2015-01-02 12:13:14,995 [main] INFO de.stuff.StuffHandling - more stuff happened
2015-01-03 12:13:14,995 [main] INFO de.stuff.StuffHandling - other stuff happened

Create Container

$ docker run -d -h testserver -v ~/elk/config:/config -v ~/elk/ssl:/ssl -v ~/elk/logfiles:/log --name logstash-forwarder logstash-forwarder

The logstash-forwarder instance log shows the connection procedure to the logstash instance.

$ docker logs -f logstash-forwarder
2015/01/02 18:26:50.491646 Waiting for 1 prospectors to initialise
2015/01/02 18:26:50.491721 All prospectors initialised with 0 states to persist
2015/01/02 18:26:50.491783 Loading client ssl certificate: /ssl/logstash.crt and /ssl/logstash.key
2015/01/02 18:26:50.617653 Setting trusted CA from file: /ssl/logstash.crt
2015/01/02 18:26:50.618132 Connecting to [172.17.0.90]:5000 (logstash) 
2015/01/02 18:26:50.752030 Connected to 172.17.0.90

Log file detection and processing is also shown.

2015/01/02 18:29:00.494792 Launching harvester on new file: /log/example.log
2015/01/02 18:29:00.494864 harvest: "/log/example.log" (offset snapshot:0)
2015/01/02 18:29:03.015543 Registrar: processing 2 events

Check the Setup

All container instances are running on the same host. That makes it easy to check if everything is working properly.

If all containers are running, the following containers should be available. The COMMAND, CREATED and STATUS fields are omitted from the output.

$ docker ps -a
CONTAINER ID IMAGE         ... PORTS                                          NAMES
fc77686e7722 logstash      ... 0.0.0.0:5000->5000/tcp                         logstash
d51896999769 kibana        ... 0.0.0.0:80->5601/tcp                           kibana
6e8c5d1c2362 elasticsearch ... 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp elasticsearch
9fedfe682766 ubuntu:14.10  ...                                                esdata

The elasticsearch status is available via its API. In this case there already are several documents indexed. Kibana created the .kibana index to store the configuration of dashboards etc.

$ curl localhost:9200/_cat/indices?v
health status index               pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   logstash-2015.01.03   5   1    1530782            0    589.5mb        589.5mb 
yellow open   logstash-2015.01.02   5   1     149380            0       59mb           59mb 
yellow open   .kibana               1   1          6            0     23.8kb         23.8kb

Kibana running at localhost:80 shows which search commands were executed to show the data in the frontend. They can be copied and also manually thrown at the elasticsearch API.

$ curl localhost:9200/logstash-2015.01.02/_search?q=*&pretty

If Kibana does not show any results although they were indexed, the timerange in the upper right corner needs to be extended. A basic visualization is a time series with the amount of processed log events.

Kibana chart