Logstash
What is Logstash?
Logstash web site
Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs
Why should i use this tool?
For many reasons
- very simple
- easier than parsing many logs file in many folders
- you can try and adopt in less than 10 minutes
- structured logs with index
- distributed
OK, fine, i will try
Prepare the setup
Try to downloadthe latest version, in our exemple its the 1.2.2
mkdir /data/Logstash && cd /data/Logstash
wget https://download.elasticsearch.org/logstash/logstash/logstash-1.2.2-flatjar.jar
Create a simple config file, we just try to parse the syslog files on the local system.
logstash-local.conf
input {
file {
path => [''/var/log/*.log']
type => 'system logs'
}
}
output {
elasticsearch {
embedded => true
}
}
In the is config file we have 2 parts, the first one we defined rge source, actualy its the syslog file of the server, but we can also user syslog tcp stream or apache log file. In the second part it s the output, where we push the information, we use an embedded elastic search, later we can use a real elasticsearch cluster.
start logstash with emebed ES and with Kibana3 embedded viewer.
java -jar logstash-1.2.2-flatjar.jar agent -f logstash-local.conf -- web
And thats all, now open the kibana web page at http://localhost:9292. you must have something like this
Production setup
Standalone
This is very simple, we have to install some few element:
- Elasticsearch
- a webserver for hosting Kibana3
- logstash with config
in this exemple our server is logstash-server, i recomand to use fqdn as usual in your config.
elasticsearch
you can use the version of your choice, from system package, or zip or a tar ball.
if we use the tar ball, we have to download and start the server
mkdir -p /data/Logstash/Src
cd /data/Logstash
wget -O Src/elasticsearch-<version>.tar.gz https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-<version>.tar.gz
tar xf Src/elasticsearch-<version>.tar.gz && cd elasticsearch-<version>
./bin/elasticsearch -f
a webserver
you can use any webserver, i will not explain how to configure apache or nginx or lighttpd.
we share the Kibana folder with the web server, we can use the tarball or git
cd /data/Logstash/
git clone https://github.com/elasticsearch/kibana.git
sed -i 's_elasticsearch: "http://"+window.location.hostname+":9200"_elasticsearch: "http://logstash-server:9200"_' kibana/src/config.js
logstash
same as the previsous setup, but we will use a real and more interesting config file.
we use some syslog directives and add some tags
logstash.conf
input {
syslog {
port => 514
type => "syslog"
tags => ["syslog"]
}
file {
type => "apache"
format => json_event
path => ['/var/log/httpd/json_access.log']
tags => ["apache_json"]
}
snmptrap {
type => "snmptrap"
yamlmibdir => "/data/groux/yaml"
port => 1062
}
}
filter {
syslog_pri { }
}
output {
elasticsearch {
host => 'logstash-server'
}
}
the config help is on the logstash web site, in this config we have some nice features and logstash act as a:
- syslog tcp/udp server [like rsyslogd/sysklogd]
- snmp trap receiver [like snmptrapd]
- specific apache log receiver (same with nginx/lighttpd is possible)
the filter section is very simple anc you can adapt to your needs, there are many possibilities
specific apache log config
we use logstash as a web log parser, for this we use a binary like log format, and a syslog for error log
Error log director to send error to syslog (logstash input)
ErrorLog syslog
Access log Format, we must use a temporary log file with a special output format
LogFormat "{ \
\"@source\":\"file ://${HOSTNAME}//var/log/httpd/json_access_log\",\
\"@source_host\": \"${HOSTNAME}\", \
\"@source_path\": \"/var/log/httpd/json_access_log\", \
\"@message\": \"%v %h %l %u %t \\\"%r\\\" %>s %b\", \
\"@fields\": { \
\"timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
\"clientip\": \"%a\", \
\"duration\": %D, \
\"status\": %>s, \
\"request\": \"%U%q\", \
\"urlpath\": \"%U\", \
\"urlquery\": \"%q\", \
\"method\": \"%m\", \
\"bytes\": %B, \
\"vhost\": \"%v\", \
\"host alias\": \"%{Host}i\", \
\"referer\": \"%{Referer}i\", \
\"agent\": \"%{User-agent}i\" \
} \
}" logstash_apache_json
Inpired from logstash cookbook
And now enjoy
With redondancy
Now we want a system without SPOF, so all system must be redundant, and with failover.
in previous chapter we see how to create vip (virtual IP) and redundant web server.
the logical schema
the physical schema
elasticsearch
Natively redundant if the cluster mode is enable, we install elasticsearch in many server and use the same config.
config/elasticsearch.yml
cluster.name: ESCluster
node.name: server1
rsyslog
it s a standart config with this this specific directive to log remote syslog messages. we listen on TCP:1514 not the standart port, because logstash act as a syslog receiver and transmit to the rsyslog servers
/etc/rsyslogd.d/10-loghost.conf
# Provides TCP syslog reception
$ModLoad imtcp.so
$InputTCPServerRun 1514
$template DailyPerHostLogs,"/data/logs/%FROMHOST%/%$YEAR%-%$MONTH%-%$DAY%/%syslogfacility-text%.log"
# Log everything in dedicated log files
*.* -?DailyPerHostLogs
All remote syslog are for logstash and will be send to rsyslog on our rsyslog setup (both servers)
don t forget to send local syslog message to logstash too (standard config)
logstash
same as previous, but the output must send syslog message to 2 servers and to the elasticsearch clister and not the server.
logstash.conf
input {
syslog {
port => 514
type => "syslog"
tags => ["syslog"]
}
file {
type => "apache"
format => json_event
path => ['/var/log/httpd/json_access.log']
tags => ["apache_json"]
}
snmptrap {
type => "snmptrap"
yamlmibdir => "/data/groux/yaml"
port => 1062
}
}
filter {
syslog_pri { }
}
output {
elasticsearch {
cluster => 'ESCluster'
}
syslog {
host => "syslogd1"
port => "1514"
rfc => "rfc5424"
}
syslog {
host => "syslogd2"
port => "1514"
rfc => "rfc5424"
}
}
And thats it
i have to make a better setup for output logstash, or perhaps another syslog setup. Keep in mind that it's just an awesome tools and you can save your mind, because plain text log files are a pain in the ass.