Friday, 8 April 2011

Git is good!

5.30pm. So I started off a new top secret project at work that I can only only describe as 'project black bear', a great name I am sure you'll aggree, which actually serves two purposes; firstly giving the project a name, until we come up with a better one, and secondly keeping the whole project hush hush within the college.

So my team all cracked open their editor of choice and started some fun PHP/MySQL hacking. We all then hit a point when we needed to merge our code, thankfully I had already had a play with git a couple of months ago and setup a new project. A couple of pulls, commits, and pushes later we were all running the same code and pushing code to the central project whenever we made a major change.

We were then able to monitor the progress via the commit messages and a great cgi script; gitweb. Sure we did receive the odd commit error however this was easily fixed with a quick diff merge.

After a pretty constructive day I'm pretty sure its time to crack open a cold beer and enjoy the rarety of British sun (not the paper).

Thursday, 31 March 2011

nrpe check_smtp & check_mysql notes

4.12pm. Today I have been playing with nagios plugins that allow me to monitor every service on my development box (Sherpa).

NB: These are local NRPE based checks (the daemon only runs on localhost).

* 'check_smtp' is setup by installing 'nagios-plugins-smtp.x86_64' and then monitored by nrpe by adding the following command to the '/etc/nagios/nrpe.cfg' file.
command[check_smtp]=/usr/lib64/nagios/plugins/check_smtp -H localhost
I can then check this from my NetworkMonitor box via:
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_smtp
SMTP OK - 0.013 sec. response time|time=0.013064s;;;0.000000
* 'check_mysql' is a little more difficult as it requires a new MySQL user to be created, but easily done. Again we install 'nagios-plugins-mysql.x86_64' but have to be perform the following before editing the '/etc/nagios/nrpe.cfg' file.
# mysql -u xxxxx -p
mysql> CREATE USER 'nrpe'@'localhost' IDENTIFIED BY '*****';
mysql> exit
Now we can test the command locally via:
# /usr/lib64/nagios/plugins/check_mysql -H localhost -u nrpe -p *****
or just add the following to the nrpe.cfg file.
command[check_mysql]=/usr/lib64/nagios/plugins/check_mysql -H localhost -u nrpe -p *****
Dont forget to restart the local nrpe service(!).

Now onto adding the service templates on our Nagios box:
define service{
use generic-service
host_name Sherpa.xxxxxxxxxxxx
service_description postfix SMTP Service Check
check_command check_nrpe!check_smtp
}
define service{
use generic-service
host_name Sherpa.xxxxxxxxxxxx
service_description MySQLd Service Check
check_command check_nrpe!check_mysql
}
reload nagios, job done.

Wednesday, 30 March 2011

Nagios NRPE: Unable to read output

3.25pm. Today I started looking into rolling out nagios monitoring over our *nix servers after the success of recently adding Windows Servers (via NSClient++) and our Core Switches (via SNMP).

Most of my servers are based on CentOS 5.5 so a quick search for nrpe (the linux alternative of nsclient++) on yum provided the required results:
nagios-plugins-nrpe.x86_64 : Provides nrpe plugin for Nagios
nrpe.x86_64 : Host/service/network monitoring agent for Nagios
Installing both packages went pretty well, adding the required nrpe user and service (just needed to 'chkconfig nrpe on' to ensure it started on boot). I then added the NetworkMonitor host IP to the nrpe.cfg file.
allowed_hosts=127.0.0.1,192.168.200.46
Annoyingly this is where I hit a road block, I was able to test communication from the Nagios/NetworkMonitor Box to the NRPE client:
# /usr/lib64/nagios/plugins/check_nrpe -H Sherpa
NRPE v2.12
However when it came to retrieving any useful data such as check_users or check_load I received the following error:
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_users
NRPE: Unable to read output
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_load
NRPE: Unable to read output
A quick in the NRPE plugins folder (/usr/lib64/nagios/plugins) provided the answer; it only contained the check_nrpe plugin (that I had installed earlier) and not a lot else. Sure enough I had only installed the nagios nrpe plugin (and not all the nrpe plugins that I first thought). Back to yum:
# yum search nagios-plugins-load
nagios-plugins-load.x86_64 : Nagios Plugin - check_load
So it looks like there is a plugin for each check, that's worth knowing! I can apply this logic to every item I now want to check on this host; or if I am feeling lazy install:
nagios-plugins-all.x86_64 : Nagios Plugins - All plugins
Which does the lot! Now to retest....

# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_users
USERS OK - 1 users currently logged in |users=1;5;10;0

# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_load
OK - load average: 0.00, 0.01, 0.00|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;

Success!

Thursday, 24 March 2011

Monitoring a HP Procurve 5412zl switch interface with Nagios (SNMP)

8.50am. Yesterday I had a couple of minutes spare to build on my Nagios setup. Since setting up Nagios I have been looking into monitoring the fibre interfaces that connect my core and satellite switches. Unfortunately I was not able to find any good examples of this done before so I set out to try and set it up myself.

To complete this task I used the check_snmp plugin as a base and built up my own command to check the interface status. After running though an snmpwalk on the switch (snmpwalk -v1 IPHERE -c public >~/Procurve_snmpwalk) I discovered the following MIBS that I can use:

ifSpeed.X (speed of link)
ifOperStatus.X (link up or down)
ifOutOctets.X & ifInOctets.X
ifOutUcastPkts.X & ifInUcastPkts.X
ifOutNUcastPkts.X & ifInNUcastPkts.X

The X indicates the Port ID, however to use this integer we need to check what ID links to what interface. This can be achieved by performing a query on the ID using the ifDescr.X MIB. The results to my snmpwalk already gave me the results I required however they can be checked manually on the command line via:
/usr/lib64/nagios/plugins/check_snmp -H %HostnameHere% -C public -o ifDescr.X
By changing the value of X, I am presented with the interface name. For example in my case ID 2 = Port A2 on the switch.

Now I know what the interface ID is for the port is I can start to build up commands based on the MIBS i discovered above. To start with I wanted a simple interface check (up/down) so I wrote the following to the /etc/nagios/objects/commands.cfg file:

define command{
     command_name check_interface_up
     command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o $ARG2$ -w 0:1 -c 0:1 -l 'Link Status'
}

I have set the warning & critical ranges to 0:1. I am unsure if this is the best way to do it, but it does work. The switch returns 1 for interface up and 2 for interface down.

I then added the following to my switch template:

define service{
     use generic-service
     host_name Block-B-5412zl
     service_description Block-A-3500yl uplink 2 status
     check_command check_interface_up!public!ifOperStatus.2
}

Using the ! symbol as a separator I am able to pass different variables to the command; $ARG1$ and $ARG2$ .

$ARG1$ = the community string checked
$ARG2$ = This is where I define the SNMP MIB ifOperStatus.2, however I can replace this with different MIBs that I mentioned earlier.

Thats pretty much it, reload nagios and I am now able to monitor switches interfaces. Next task is monitoring the traffic flowing through the interfaces by monitoring ifOutUcastPkts.X & ifInUcastPkts.X in cacti.

Friday, 18 March 2011

4.50pm. I was recently told about the idea of publishing nagios alerts to a private twitter account

As it was a nice quiet day so I had a look at implementing the scripts today. I had a few problems getting the tweets through our http proxy, but managed to solve this problem by adding:

$ENV{http_proxy}='http://proxy:8080';

to the top of the nitter(v2) script. I then moved the script to /etc/nagios/ and I was away.


Fridays goal: complete.
10.50am. So I have had to look this command up a couple of times now as I keep forgetting. The command for installing CPAN modules into Perl is as follows:

perl -MCPAN -e shell
cpan> install Type::ModuleName

Today's reason for its use? Installing a Perl script to process twitter OAUTH authentication.
Why? Sending alerts from Nagios to twitter.

Wednesday, 16 March 2011

3.50pm. This week I have mostly been installing and configuring up: Nagios.

Cutting straight to the verdict: awesome.

So it takes a little while to set-up, but once you have learnt the slightly odd templates/services/commands/hosts system its just a case of replication.

So far I have configured it to monitor:
* the majority of network printers,
* our Windows servers,
* our Linux servers,
* ....and a handful of procurve switches.

I did have to disable the alerts on the Network Printers as they became incredibly annoying when office staff were saving the planet by turning them off at night.

Next step is to gather more statistics from the linux servers, and monitor key fibre interfaces on the switches.

Wednesday, 9 March 2011

6.45pm. Its done. The data mining/extraction script I have been working on in Perl is finally complete. Last week I set myself a goal of automating the search process that we as website users perform daily. I went to work and set myself up a dummy search site and had a play. A couple of days later (along with a few beers along the way) I have accomplished the following:

* The ability to login into access controlled sites that use session tracking.
* Download a copy of every searchable record, starting with an id 1 and ending with 10000.
* Extract the data from the local files, and build a CSV file.
* Clean up the CSV file, removing all null entries, and blank lines.

The end result?

A 700 line CSV containing information I require from an external searchable system.

Now where's that beer....

Friday, 4 March 2011

10.00pm. Okay so I having quite a productive day in the office, I decided to continue my brain workout by picking up some more Perl code.

I wrote a couple of scripts that are able to download data from search pages, including a version that is able to pass login information, this made the whole challenge slightly more difficult, in the end I was able to utilise the --keep-session-cookies & --save-cookies/--load-cookies arguments of wget to achieve my goals.

The next step was to read in an array of search variables to pass to wget, easily done. I then started looking at automating the whole process by looking at search forms that used unique id's in their POST data; with some friendly $int++ I was able to generate a whole bunch of searches with success.

Next step: Trawling through the pages to extract the data I require.

8.000pm. So over the last 2 days at work I have been setting up and configuring nagios. It's been on my to-do list for a while now (years) and I have finally found a couple of days to crack on with it. Its pretty easy to set up, and once you have worked out the difference between templates, hostgroups, and servicegroups you're all set.

I guess the only frustrating part is the constant personal email bombardment (every hour) if you leave something misconfigured in the office, can't really blame the product for that though.

Thursday, 3 March 2011

11.40am. Well that was easy. I upgraded our virtualisation infrastructure to Xenserver 5.6 a couple of weeks ago but at the time I forgot to upgrade the XenTools packages. With 5 minutes spare today I was able to complete this task by performing the following:

* Attach the XenTools ISO by selecting VM->Install XenServer Tools.
* Mount the DVD:
mount /dev/xvdd /mnt
* Run the script on the mounted DVD.
./mnt/Linux/install.sh
* Unmount the DVD
umount /mnt
* Restart the VM.
* Check under Virtualisation State (found under the general tab for that VM) it says: Optimized

Fin.

Tuesday, 1 March 2011

7.50pm. So I watched 'the social network' over the weekend, and after watching the 'hacking' scene at the beginning of the film, I can confirm that downloading the jpeg pictures via an apache index is indeed kids stuff.

In case you were you were wondering I accomplished it via:
wget -r -A .jpeg APACHE_INDEX_PAGE_URL_HERE

4.00pm. The synchronisation finally finished, only took 14hrs! Now I am struggling with the telephony box again that we had to restart yesterday, it appears to be running on IIS 5.1 Personal Web Server with its oh so joyful 10 simultaneous user limit. It appears to be registering each transaction I complete as a new connection under Opera and then drops me to a 403 after 10 clicks. Joy.

I can imagine the response from their tech support now; "Have you tried restarting the server?"

Monday, 28 February 2011

4.16pm. Well the synchronisation is still going, and I am off home soon. I will check it again in the morning.

In the mean time I have been learning a little Perl with the oh so popular emacs(23) editor. It appears to be pretty similar syntax wise to PHP right now, but hopefully I will discover some more advanced features soon.

2.40pm. Ok so backups complete on Internet payment system, however I need to remember that a performing a full synchronisation between our database and the externally hosted database takes hours!

11.30am. So it looks like the lack of space on the Windows partition on our print server has had a negative effect on our Internet payments system that is running on the same box. I am in the process of re-running the database backups now after a stuck process managed to stop the whole system.

All in a days work as a sysadmin I guess :-)

10.30am. So I just got back to work after the half term holidays (enjoyed a great week skiing), and sure enough it appears that all hell has broken lose since I have been away. Glancing at the 'ICT Support Dashboard' as I walk in I see a plethora of "cannot print" tickets.

A quick glance at the primary print server shows me the problem; there was no space left on the Windows partition, a quick clear up, and restarting the print spooler fixes that problem. Note to self: need to get a good networking monitoring software application installed to catch these problems before they arise.

Walking out of the office I am stopped by the Reception staff who tell me the phone answer phone system has stopped working; quick reboot of the telephony server and we're back in action. Not a lot I can do about this, the software is written by a small company who consider "reboot the server" as the fix for all their software's problems.

Now its time to check the weekly backup ran and tackle the 3 4 malware infested laptops.