Friday, 8 April 2011

Git is good!

5.30pm. So I started off a new top secret project at work that I can only only describe as 'project black bear', a great name I am sure you'll aggree, which actually serves two purposes; firstly giving the project a name, until we come up with a better one, and secondly keeping the whole project hush hush within the college.

So my team all cracked open their editor of choice and started some fun PHP/MySQL hacking. We all then hit a point when we needed to merge our code, thankfully I had already had a play with git a couple of months ago and setup a new project. A couple of pulls, commits, and pushes later we were all running the same code and pushing code to the central project whenever we made a major change.

We were then able to monitor the progress via the commit messages and a great cgi script; gitweb. Sure we did receive the odd commit error however this was easily fixed with a quick diff merge.

After a pretty constructive day I'm pretty sure its time to crack open a cold beer and enjoy the rarety of British sun (not the paper).

Thursday, 31 March 2011

nrpe check_smtp & check_mysql notes

4.12pm. Today I have been playing with nagios plugins that allow me to monitor every service on my development box (Sherpa).

NB: These are local NRPE based checks (the daemon only runs on localhost).

* 'check_smtp' is setup by installing 'nagios-plugins-smtp.x86_64' and then monitored by nrpe by adding the following command to the '/etc/nagios/nrpe.cfg' file.
command[check_smtp]=/usr/lib64/nagios/plugins/check_smtp -H localhost
I can then check this from my NetworkMonitor box via:
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_smtp
SMTP OK - 0.013 sec. response time|time=0.013064s;;;0.000000
* 'check_mysql' is a little more difficult as it requires a new MySQL user to be created, but easily done. Again we install 'nagios-plugins-mysql.x86_64' but have to be perform the following before editing the '/etc/nagios/nrpe.cfg' file.
# mysql -u xxxxx -p
mysql> CREATE USER 'nrpe'@'localhost' IDENTIFIED BY '*****';
mysql> exit
Now we can test the command locally via:
# /usr/lib64/nagios/plugins/check_mysql -H localhost -u nrpe -p *****
or just add the following to the nrpe.cfg file.
command[check_mysql]=/usr/lib64/nagios/plugins/check_mysql -H localhost -u nrpe -p *****
Dont forget to restart the local nrpe service(!).

Now onto adding the service templates on our Nagios box:
define service{
use generic-service
host_name Sherpa.xxxxxxxxxxxx
service_description postfix SMTP Service Check
check_command check_nrpe!check_smtp
}
define service{
use generic-service
host_name Sherpa.xxxxxxxxxxxx
service_description MySQLd Service Check
check_command check_nrpe!check_mysql
}
reload nagios, job done.

Wednesday, 30 March 2011

Nagios NRPE: Unable to read output

3.25pm. Today I started looking into rolling out nagios monitoring over our *nix servers after the success of recently adding Windows Servers (via NSClient++) and our Core Switches (via SNMP).

Most of my servers are based on CentOS 5.5 so a quick search for nrpe (the linux alternative of nsclient++) on yum provided the required results:
nagios-plugins-nrpe.x86_64 : Provides nrpe plugin for Nagios
nrpe.x86_64 : Host/service/network monitoring agent for Nagios
Installing both packages went pretty well, adding the required nrpe user and service (just needed to 'chkconfig nrpe on' to ensure it started on boot). I then added the NetworkMonitor host IP to the nrpe.cfg file.
allowed_hosts=127.0.0.1,192.168.200.46
Annoyingly this is where I hit a road block, I was able to test communication from the Nagios/NetworkMonitor Box to the NRPE client:
# /usr/lib64/nagios/plugins/check_nrpe -H Sherpa
NRPE v2.12
However when it came to retrieving any useful data such as check_users or check_load I received the following error:
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_users
NRPE: Unable to read output
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_load
NRPE: Unable to read output
A quick in the NRPE plugins folder (/usr/lib64/nagios/plugins) provided the answer; it only contained the check_nrpe plugin (that I had installed earlier) and not a lot else. Sure enough I had only installed the nagios nrpe plugin (and not all the nrpe plugins that I first thought). Back to yum:
# yum search nagios-plugins-load
nagios-plugins-load.x86_64 : Nagios Plugin - check_load
So it looks like there is a plugin for each check, that's worth knowing! I can apply this logic to every item I now want to check on this host; or if I am feeling lazy install:
nagios-plugins-all.x86_64 : Nagios Plugins - All plugins
Which does the lot! Now to retest....

# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_users
USERS OK - 1 users currently logged in |users=1;5;10;0

# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_load
OK - load average: 0.00, 0.01, 0.00|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;

Success!

Thursday, 24 March 2011

Monitoring a HP Procurve 5412zl switch interface with Nagios (SNMP)

8.50am. Yesterday I had a couple of minutes spare to build on my Nagios setup. Since setting up Nagios I have been looking into monitoring the fibre interfaces that connect my core and satellite switches. Unfortunately I was not able to find any good examples of this done before so I set out to try and set it up myself.

To complete this task I used the check_snmp plugin as a base and built up my own command to check the interface status. After running though an snmpwalk on the switch (snmpwalk -v1 IPHERE -c public >~/Procurve_snmpwalk) I discovered the following MIBS that I can use:

ifSpeed.X (speed of link)
ifOperStatus.X (link up or down)
ifOutOctets.X & ifInOctets.X
ifOutUcastPkts.X & ifInUcastPkts.X
ifOutNUcastPkts.X & ifInNUcastPkts.X

The X indicates the Port ID, however to use this integer we need to check what ID links to what interface. This can be achieved by performing a query on the ID using the ifDescr.X MIB. The results to my snmpwalk already gave me the results I required however they can be checked manually on the command line via:
/usr/lib64/nagios/plugins/check_snmp -H %HostnameHere% -C public -o ifDescr.X
By changing the value of X, I am presented with the interface name. For example in my case ID 2 = Port A2 on the switch.

Now I know what the interface ID is for the port is I can start to build up commands based on the MIBS i discovered above. To start with I wanted a simple interface check (up/down) so I wrote the following to the /etc/nagios/objects/commands.cfg file:

define command{
     command_name check_interface_up
     command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o $ARG2$ -w 0:1 -c 0:1 -l 'Link Status'
}

I have set the warning & critical ranges to 0:1. I am unsure if this is the best way to do it, but it does work. The switch returns 1 for interface up and 2 for interface down.

I then added the following to my switch template:

define service{
     use generic-service
     host_name Block-B-5412zl
     service_description Block-A-3500yl uplink 2 status
     check_command check_interface_up!public!ifOperStatus.2
}

Using the ! symbol as a separator I am able to pass different variables to the command; $ARG1$ and $ARG2$ .

$ARG1$ = the community string checked
$ARG2$ = This is where I define the SNMP MIB ifOperStatus.2, however I can replace this with different MIBs that I mentioned earlier.

Thats pretty much it, reload nagios and I am now able to monitor switches interfaces. Next task is monitoring the traffic flowing through the interfaces by monitoring ifOutUcastPkts.X & ifInUcastPkts.X in cacti.

Friday, 18 March 2011

4.50pm. I was recently told about the idea of publishing nagios alerts to a private twitter account

As it was a nice quiet day so I had a look at implementing the scripts today. I had a few problems getting the tweets through our http proxy, but managed to solve this problem by adding:

$ENV{http_proxy}='http://proxy:8080';

to the top of the nitter(v2) script. I then moved the script to /etc/nagios/ and I was away.


Fridays goal: complete.
10.50am. So I have had to look this command up a couple of times now as I keep forgetting. The command for installing CPAN modules into Perl is as follows:

perl -MCPAN -e shell
cpan> install Type::ModuleName

Today's reason for its use? Installing a Perl script to process twitter OAUTH authentication.
Why? Sending alerts from Nagios to twitter.

Wednesday, 16 March 2011

3.50pm. This week I have mostly been installing and configuring up: Nagios.

Cutting straight to the verdict: awesome.

So it takes a little while to set-up, but once you have learnt the slightly odd templates/services/commands/hosts system its just a case of replication.

So far I have configured it to monitor:
* the majority of network printers,
* our Windows servers,
* our Linux servers,
* ....and a handful of procurve switches.

I did have to disable the alerts on the Network Printers as they became incredibly annoying when office staff were saving the planet by turning them off at night.

Next step is to gather more statistics from the linux servers, and monitor key fibre interfaces on the switches.