Cloud Computing‎ > ‎Tools‎ > ‎

Distributive - Health Check













Introduction


Distributive is a tool for running distributed health checks in datacenters. It was designed with Consul in mind, but is platform agnostic. It is simple to configure (with JSON checklists) and easy to deploy and run.


The exit code meanings are defined as Consul, Kubernetes, Sensu, and Nagios recognize them.


Exit code 0 - Checklist is passing

Exit code 1 - Checklist is warning

Any other code - Checklist is failing

As of right now, only exit codes 0 and 1 are used, even if a checklist fails.


Installation


Install Golang

sudo yum install golang -y


$ go version

go version go1.4.2 linux/amd64


Install Distributive

sudo rpm -i https://github.com/CiscoCloud/distributive/releases/download/v0.2/distributive-0.2-1.x86_64.rpm


Verify Distributive


 

$ distributive --help

NAME:

   Distributive - Perform distributed health tests


USAGE:

   Distributive [global options] command [command options] [arguments...]


VERSION:

   0.2


COMMANDS:

   help, h      Shows a list of commands or help for one command


GLOBAL OPTIONS:

   --verbosity 'warn'                           info | debug | fatal | error | panic | warn

   --file, -f                                   Read a checklist from a file

   --url, -u                                    Read a checklist from a URL

   --directory, -d '/etc/distributive.d/'       Read all of the checklists in this directory

   --stdin, -s                                  Read data piped from stdin as a checklist

   --help, -h                                   show help

   --version, -v                                print the version



FATA[0000] Neither file, URL, directory, nor stdin specified. Try --help.



Getting Started


Install Kakfa


Follow the How-To Kafka to Install Kafka.


Create Distributive Config directory


All distributive application checklist will be present in this directory.

Distributive will run and pick all config in this directory and run checklist for each and report passed and failed checks.

sudo mkdir  /etc/distributive.d


Define Distributive checklist for Kafka topic list and sshd service


The checklist could specify variety of checks from command, service to network\port.


 

$ sudo cat /etc/distributive.d/kafka.json

{

    "Name": "Kafka Checklist",

    "Notes": "A checklist that has checks, for kafka!",

    "Checklist": [

        {

                "Name": "Kafka check list of topic",

                "Notes": "Check kafka topic list is present.",

                "Check": "command",

                "Parameters": ["bin/kafka-topics.sh --list --zookeeper localhost:2181"]

        },

        {

                "Name": "SSHD service",

                "Notes": "Check sshd service status.",

                "Check": "systemctlActive",

                "Parameters": ["sshd"]

        },

        {

                "Name": "Miscserver port",

                "Notes": "Check misc servert port.",

                "Check": "port",

                "Parameters": ["9099"]

        }



    ]

}


Execute Distributive


$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report    checklist=Kafka Checklist report=

Total: 3

Passed: 2

Failed: 1

Port not open:

        Specified: 9099

        Actual:


Distributive detecting service state



SSHD service check with above json demonstrated distributive detecting service up\down with specified service check.

Similar, various checks for network, service, package could be enforced.


 

# SSHD is enabled


$ systemctl status sshd

sshd.service - OpenSSH server daemon

   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled)


$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report    checklist=Kafka Checklist report=

Total: 3

Passed: 2

Failed: 1

Port not open:

        Specified: 9099

        Actual:


#SSHD is disabled


$ systemctl stop sshd


$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report    checklist=Kafka Checklist report=

Total: 3

Passed: 1

Failed: 2

Service not active:

         Specified: sshd

         Actual ActiveState=Inactive

Port not open:

        Specified: 9099

        Actual:


#SSHD is enabled


$ systemctl start sshd


$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report    checklist=Kafka Checklist report=

Total: 3

Passed: 2

Failed: 1

Port not open:

        Specified: 9099

        Actual:




Distributive health-check jsons in Worker VM


The distributive health check json packing in Worker VM would involve following actions:


0. Directory Structure for jsons


Distributive Parent Directory : /etc/distributive.d


Images Directory : /etc/distributive.d/images

Resource Image Directory : /etc/distributive.d/images/resource

Deployment Image Directory : /etc/distributive.d/images/deployment


Services Directory : /etc/distributive.d/services

Logging Service Directory : /etc/distributive.d/services/logging


Controller Directory : /etc/distributive.d/controller

Service Controller Directory : /etc/distributive.d/controller/service

App Controller Directory : /etc/distributive.d/controller/app



1. Platform Base Team

The distributed config directory would be pre-created with worker vm image at /etc/distributive.d/


2. Service Team e.g. Kafka or any other Applications

The Application team would write the health check json for Application for Distributive.

The Application installation would include Application json copy to Distributive directory.


3. Distributive involation with config directory

The distributive would be invoked with config directory so all jsons get executed and Distributive report gets generated.


4. Distribute to attach as registered service with Consul

Distribute would register as service in consul with distributive config directory.

If any check in Distributive config json fails, consul will create service failure warning, after which Admin could execute Distribute to get detailed checklist report and take corrective action.


The distributive service registration in Consul will happen via json as below


 

$ sudo cat /etc/consul.d/distributive.json

{

  "service": {

    "name": "Distributive",

    "check": {

      "script": "/usr/bin/distributive -d /etc/distributive.d/,

      "interval": "10s"

    }

  }

}



Getting Started with Consul


Install Consul


$ mkdir consul

$ cd consul

$ wget  https://dl.bintray.com/mitchellh/consul/0.5.1_linux_amd64.zip

$ unzip 0.5.1_linux_amd64.zip


$ sudo mkdir /etc/consul.d

$ sudo cat /etc/consul.d/distributive.json

{

  "service": {

    "name": "Distributive",

    "check": {

      "script": "/usr/bin/distributive -f  /etc/distributive.d/checklist.json -d ''",

      "interval": "10s"

    }

  }

}


Distributive Config


$ sudo ls /etc/distributive.d

samples


$ sudo cat /etc/distributive.d/checklist.json

{

    "Name": "My first checklist",

    "Notes": "A checklist that has checks, really!",

    "Checklist": [

         {

                "Name": "Git installation check",

                "Notes": "If I don't have git, I don't know what I'll do.",

                "Check": "Installed",

                "Parameters": ["git"]

        }

    ]

}



]$ sudo cat /etc/distributive.d/checklist.json

{

    "Name": "My first checklist",

    "Notes": "A checklist that has checks, really!",

    "Checklist": [

         {

                "Name": "Git installation check",

                "Notes": "If I don't have git, I don't know what I'll do.",

                "Check": "Installed",

                "Parameters": ["git"]

        },

        {

                "Name": "Git installation check",

                "Notes": "If I don't have git, I don't know what I'll do.",

                "Check": "Installed",

                "Parameters": ["docker"]

        },

        {

            "Check": "port",

            "Parameters": ["41483"]

        },

        {

            "Check" : "interface",

            "Parameters" : ["docker0"]

        }

    ]

}


Start Consul


$ ./consul agent -server -bootstrap-expect=1 -data-dir /tmp/consul -config-dir /etc/consul.d -dc dc-distributive -node=consul-distributive-1 --bind=192.168.0.167



Run Distributive Standalone


]$ /usr/bin/distributive -f  /etc/distributive.d/checklist.json --verbosity="info"

INFO[0000] Creating checklist(s)...                      path=/etc/distributive.d/checklist.json type=file

INFO[0000] Running checklist: My first checklist

INFO[0000] Check passed                                  name=Git installation check type=Installed

INFO[0000] Check passed                                  name=Git installation check type=Installed

INFO[0000] Check failed                                  name= type=port

INFO[0000] Check failed                                  name= type=interface

WARN[0000] Check(s) failed, printing checklist report    checklist=My first checklist report=

Total: 4

Passed: 2

Failed: 2

Port not open:

        Specified: 41483

        Actual:

Interface does not exist:

        Specified: docker0

        Actual: lo, eno16777728, vboxnet0


















Comments