Logstash

What is Logstash?

Logstash web site

Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs

Why should i use this tool?

For many reasons

  • very simple
  • easier than parsing many logs file in many folders
  • you can try and adopt in less than 10 minutes
  • structured logs with index
  • distributed

OK, fine, i will try

Prepare the setup

Try to downloadthe latest version, in our exemple its the 1.2.2

mkdir /data/Logstash && cd /data/Logstash
wget https://download.elasticsearch.org/logstash/logstash/logstash-1.2.2-flatjar.jar

Create a simple config file, we just try to parse the syslog files on the local system.

logstash-local.conf

input {
  file {
    path => [''/var/log/*.log']
    type => 'system logs'
  }
}
output {
  elasticsearch {
  embedded => true
  }
}

In the is config file we have 2 parts, the first one we defined rge source, actualy its the syslog file of the server, but we can also user syslog tcp stream or apache log file. In the second part it s the output, where we push the information, we use an embedded elastic search, later we can use a real elasticsearch cluster.

start logstash with emebed ES and with Kibana3 embedded viewer.

java -jar logstash-1.2.2-flatjar.jar agent -f logstash-local.conf -- web

And thats all, now open the kibana web page at http://localhost:9292. you must have something like this

Kibana3 screenshot

Production setup

Standalone

This is very simple, we have to install some few element:

  • Elasticsearch
  • a webserver for hosting Kibana3
  • logstash with config

in this exemple our server is logstash-server, i recomand to use fqdn as usual in your config.

elasticsearch

you can use the version of your choice, from system package, or zip or a tar ball.

if we use the tar ball, we have to download and start the server

mkdir -p /data/Logstash/Src
cd /data/Logstash
wget -O Src/elasticsearch-<version>.tar.gz https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-<version>.tar.gz
tar xf Src/elasticsearch-<version>.tar.gz && cd  elasticsearch-<version>
./bin/elasticsearch -f

a webserver

you can use any webserver, i will not explain how to configure apache or nginx or lighttpd.

we share the Kibana folder with the web server, we can use the tarball or git

cd /data/Logstash/
git clone https://github.com/elasticsearch/kibana.git
sed -i 's_elasticsearch: "http://"+window.location.hostname+":9200"_elasticsearch: "http://logstash-server:9200"_' kibana/src/config.js

logstash

same as the previsous setup, but we will use a real and more interesting config file.

we use some syslog directives and add some tags

logstash.conf

input {

syslog {

port => 514

type => "syslog"

tags => ["syslog"]

}

file {

type => "apache"

format => json_event

path => ['/var/log/httpd/json_access.log']

tags => ["apache_json"]

}

snmptrap {

type => "snmptrap"

yamlmibdir => "/data/groux/yaml"

port => 1062

}

}

filter {

syslog_pri { }

}

output {

elasticsearch {

host => 'logstash-server'

}

}

the config help is on the logstash web site, in this config we have some nice features and logstash act as a:

  • syslog tcp/udp server [like rsyslogd/sysklogd]
  • snmp trap receiver [like snmptrapd]
  • specific apache log receiver (same with nginx/lighttpd is possible)

the filter section is very simple anc you can adapt to your needs, there are many possibilities

specific apache log config

we use logstash as a web log parser, for this we use a binary like log format, and a syslog for error log

Error log director to send error to syslog (logstash input)

ErrorLog syslog

Access log Format, we must use a temporary log file with a special output format

LogFormat "{ \

\"@source\":\"file ://${HOSTNAME}//var/log/httpd/json_access_log\",\

\"@source_host\": \"${HOSTNAME}\", \

\"@source_path\": \"/var/log/httpd/json_access_log\", \

\"@message\": \"%v %h %l %u %t \\\"%r\\\" %>s %b\", \

\"@fields\": { \

\"timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \

\"clientip\": \"%a\", \

\"duration\": %D, \

\"status\": %>s, \

\"request\": \"%U%q\", \

\"urlpath\": \"%U\", \

\"urlquery\": \"%q\", \

\"method\": \"%m\", \

\"bytes\": %B, \

\"vhost\": \"%v\", \

\"host alias\": \"%{Host}i\", \

\"referer\": \"%{Referer}i\", \

\"agent\": \"%{User-agent}i\" \

} \

}" logstash_apache_json

Inpired from logstash cookbook

And now enjoy

With redondancy

Now we want a system without SPOF, so all system must be redundant, and with failover.

in previous chapter we see how to create vip (virtual IP) and redundant web server.

the logical schema

Logical Architecture

the physical schema

elasticsearch

Natively redundant if the cluster mode is enable, we install elasticsearch in many server and use the same config.

config/elasticsearch.yml

cluster.name: ESCluster

node.name: server1

rsyslog

it s a standart config with this this specific directive to log remote syslog messages. we listen on TCP:1514 not the standart port, because logstash act as a syslog receiver and transmit to the rsyslog servers

/etc/rsyslogd.d/10-loghost.conf

# Provides TCP syslog reception

$ModLoad imtcp.so

$InputTCPServerRun 1514

$template DailyPerHostLogs,"/data/logs/%FROMHOST%/%$YEAR%-%$MONTH%-%$DAY%/%syslogfacility-text%.log"

# Log everything in dedicated log files

*.* -?DailyPerHostLogs

All remote syslog are for logstash and will be send to rsyslog on our rsyslog setup (both servers)

don t forget to send local syslog message to logstash too (standard config)

logstash

same as previous, but the output must send syslog message to 2 servers and to the elasticsearch clister and not the server.

logstash.conf

input {

syslog {

port => 514

type => "syslog"

tags => ["syslog"]

}

file {

type => "apache"

format => json_event

path => ['/var/log/httpd/json_access.log']

tags => ["apache_json"]

}

snmptrap {

type => "snmptrap"

yamlmibdir => "/data/groux/yaml"

port => 1062

}

}

filter {

syslog_pri { }

}

output {

elasticsearch {

cluster => 'ESCluster'

}

syslog {

host => "syslogd1"

port => "1514"

rfc => "rfc5424"

}

syslog {

host => "syslogd2"

port => "1514"

rfc => "rfc5424"

}

}

And thats it

i have to make a better setup for output logstash, or perhaps another syslog setup. Keep in mind that it's just an awesome tools and you can save your mind, because plain text log files are a pain in the ass.