Friday, 8 April 2011
Git is good!
So my team all cracked open their editor of choice and started some fun PHP/MySQL hacking. We all then hit a point when we needed to merge our code, thankfully I had already had a play with git a couple of months ago and setup a new project. A couple of pulls, commits, and pushes later we were all running the same code and pushing code to the central project whenever we made a major change.
We were then able to monitor the progress via the commit messages and a great cgi script; gitweb. Sure we did receive the odd commit error however this was easily fixed with a quick diff merge.
After a pretty constructive day I'm pretty sure its time to crack open a cold beer and enjoy the rarety of British sun (not the paper).
Thursday, 31 March 2011
nrpe check_smtp & check_mysql notes
NB: These are local NRPE based checks (the daemon only runs on localhost).
* 'check_smtp' is setup by installing 'nagios-plugins-smtp.x86_64' and then monitored by nrpe by adding the following command to the '/etc/nagios/nrpe.cfg' file.
command[check_smtp]=/usr/lib64/nagios/plugins/check_smtp -H localhostI can then check this from my NetworkMonitor box via:
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_smtp* 'check_mysql' is a little more difficult as it requires a new MySQL user to be created, but easily done. Again we install 'nagios-plugins-mysql.x86_64' but have to be perform the following before editing the '/etc/nagios/nrpe.cfg' file.
SMTP OK - 0.013 sec. response time|time=0.013064s;;;0.000000
# mysql -u xxxxx -pNow we can test the command locally via:
mysql> CREATE USER 'nrpe'@'localhost' IDENTIFIED BY '*****';
mysql> exit
# /usr/lib64/nagios/plugins/check_mysql -H localhost -u nrpe -p *****or just add the following to the nrpe.cfg file.
command[check_mysql]=/usr/lib64/nagios/plugins/check_mysql -H localhost -u nrpe -p *****Dont forget to restart the local nrpe service(!).
Now onto adding the service templates on our Nagios box:
define service{reload nagios, job done.
use generic-service
host_name Sherpa.xxxxxxxxxxxx
service_description postfix SMTP Service Check
check_command check_nrpe!check_smtp
}
define service{
use generic-service
host_name Sherpa.xxxxxxxxxxxx
service_description MySQLd Service Check
check_command check_nrpe!check_mysql
}
Wednesday, 30 March 2011
Nagios NRPE: Unable to read output
Most of my servers are based on CentOS 5.5 so a quick search for nrpe (the linux alternative of nsclient++) on yum provided the required results:
nagios-plugins-nrpe.x86_64 : Provides nrpe plugin for NagiosInstalling both packages went pretty well, adding the required nrpe user and service (just needed to 'chkconfig nrpe on' to ensure it started on boot). I then added the NetworkMonitor host IP to the nrpe.cfg file.
nrpe.x86_64 : Host/service/network monitoring agent for Nagios
allowed_hosts=127.0.0.1,192.168.200.46Annoyingly this is where I hit a road block, I was able to test communication from the Nagios/NetworkMonitor Box to the NRPE client:
# /usr/lib64/nagios/plugins/check_nrpe -H SherpaHowever when it came to retrieving any useful data such as check_users or check_load I received the following error:
NRPE v2.12
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_usersA quick in the NRPE plugins folder (/usr/lib64/nagios/plugins) provided the answer; it only contained the check_nrpe plugin (that I had installed earlier) and not a lot else. Sure enough I had only installed the nagios nrpe plugin (and not all the nrpe plugins that I first thought). Back to yum:
NRPE: Unable to read output
# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_load
NRPE: Unable to read output
# yum search nagios-plugins-loadSo it looks like there is a plugin for each check, that's worth knowing! I can apply this logic to every item I now want to check on this host; or if I am feeling lazy install:
nagios-plugins-load.x86_64 : Nagios Plugin - check_load
nagios-plugins-all.x86_64 : Nagios Plugins - All pluginsWhich does the lot! Now to retest....
Success!# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_users
USERS OK - 1 users currently logged in |users=1;5;10;0# /usr/lib64/nagios/plugins/check_nrpe -H sherpa -c check_load
OK - load average: 0.00, 0.01, 0.00|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;
Thursday, 24 March 2011
Monitoring a HP Procurve 5412zl switch interface with Nagios (SNMP)
To complete this task I used the check_snmp plugin as a base and built up my own command to check the interface status. After running though an snmpwalk on the switch (snmpwalk -v1 IPHERE -c public >~/Procurve_snmpwalk) I discovered the following MIBS that I can use:
ifSpeed.X (speed of link)
ifOperStatus.X (link up or down)
ifOutOctets.X & ifInOctets.X
ifOutUcastPkts.X & ifInUcastPkts.X
ifOutNUcastPkts.X & ifInNUcastPkts.X
The X indicates the Port ID, however to use this integer we need to check what ID links to what interface. This can be achieved by performing a query on the ID using the ifDescr.X MIB. The results to my snmpwalk already gave me the results I required however they can be checked manually on the command line via:
/usr/lib64/nagios/plugins/check_snmp -H %HostnameHere% -C public -o ifDescr.XBy changing the value of X, I am presented with the interface name. For example in my case ID 2 = Port A2 on the switch.
Now I know what the interface ID is for the port is I can start to build up commands based on the MIBS i discovered above. To start with I wanted a simple interface check (up/down) so I wrote the following to the /etc/nagios/objects/commands.cfg file:
define command{
command_name check_interface_up
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o $ARG2$ -w 0:1 -c 0:1 -l 'Link Status'
}
I have set the warning & critical ranges to 0:1. I am unsure if this is the best way to do it, but it does work. The switch returns 1 for interface up and 2 for interface down.
I then added the following to my switch template:
define service{
use generic-service
host_name Block-B-5412zl
service_description Block-A-3500yl uplink 2 status
check_command check_interface_up!public!ifOperStatus.2
}
Using the ! symbol as a separator I am able to pass different variables to the command; $ARG1$ and $ARG2$ .
$ARG1$ = the community string checked
$ARG2$ = This is where I define the SNMP MIB ifOperStatus.2, however I can replace this with different MIBs that I mentioned earlier.
Thats pretty much it, reload nagios and I am now able to monitor switches interfaces. Next task is monitoring the traffic flowing through the interfaces by monitoring ifOutUcastPkts.X & ifInUcastPkts.X in cacti.
Friday, 18 March 2011
As it was a nice quiet day so I had a look at implementing the scripts today. I had a few problems getting the tweets through our http proxy, but managed to solve this problem by adding:
$ENV{http_proxy}='http://proxy:8080';
to the top of the nitter(v2) script. I then moved the script to /etc/nagios/ and I was away.
Fridays goal: complete.
perl -MCPAN -e shell
cpan> install Type::ModuleName
Today's reason for its use? Installing a Perl script to process twitter OAUTH authentication.
Why? Sending alerts from Nagios to twitter.
Wednesday, 16 March 2011
Cutting straight to the verdict: awesome.
So it takes a little while to set-up, but once you have learnt the slightly odd templates/services/commands/hosts system its just a case of replication.
So far I have configured it to monitor:
* the majority of network printers,
* our Windows servers,
* our Linux servers,
* ....and a handful of procurve switches.
I did have to disable the alerts on the Network Printers as they became incredibly annoying when office staff were saving the planet by turning them off at night.
Next step is to gather more statistics from the linux servers, and monitor key fibre interfaces on the switches.
Wednesday, 9 March 2011
* The ability to login into access controlled sites that use session tracking.
* Download a copy of every searchable record, starting with an id 1 and ending with 10000.
* Extract the data from the local files, and build a CSV file.
* Clean up the CSV file, removing all null entries, and blank lines.
The end result?
A 700 line CSV containing information I require from an external searchable system.
Now where's that beer....
Friday, 4 March 2011
I wrote a couple of scripts that are able to download data from search pages, including a version that is able to pass login information, this made the whole challenge slightly more difficult, in the end I was able to utilise the --keep-session-cookies & --save-cookies/--load-cookies arguments of wget to achieve my goals.
The next step was to read in an array of search variables to pass to wget, easily done. I then started looking at automating the whole process by looking at search forms that used unique id's in their POST data; with some friendly $int++ I was able to generate a whole bunch of searches with success.
Next step: Trawling through the pages to extract the data I require.
8.000pm. So over the last 2 days at work I have been setting up and configuring nagios. It's been on my to-do list for a while now (years) and I have finally found a couple of days to crack on with it. Its pretty easy to set up, and once you have worked out the difference between templates, hostgroups, and servicegroups you're all set.
I guess the only frustrating part is the constant personal email bombardment (every hour) if you leave something misconfigured in the office, can't really blame the product for that though.
Thursday, 3 March 2011
11.40am. Well that was easy. I upgraded our virtualisation infrastructure to Xenserver 5.6 a couple of weeks ago but at the time I forgot to upgrade the XenTools packages. With 5 minutes spare today I was able to complete this task by performing the following:
* Attach the XenTools ISO by selecting VM->Install XenServer Tools.
* Mount the DVD:
mount /dev/xvdd /mnt
* Run the script on the mounted DVD.
./mnt/Linux/install.sh
* Unmount the DVD
umount /mnt
* Restart the VM.
* Check under Virtualisation State (found under the general tab for that VM) it says: Optimized
Fin.
Tuesday, 1 March 2011
7.50pm. So I watched 'the social network' over the weekend, and after watching the 'hacking' scene at the beginning of the film, I can confirm that downloading the jpeg pictures via an apache index is indeed kids stuff.
In case you were you were wondering I accomplished it via:
wget -r -A .jpeg APACHE_INDEX_PAGE_URL_HERE
4.00pm. The synchronisation finally finished, only took 14hrs! Now I am struggling with the telephony box again that we had to restart yesterday, it appears to be running on IIS 5.1 Personal Web Server with its oh so joyful 10 simultaneous user limit. It appears to be registering each transaction I complete as a new connection under Opera and then drops me to a 403 after 10 clicks. Joy.
I can imagine the response from their tech support now; "Have you tried restarting the server?"
Monday, 28 February 2011
4.16pm. Well the synchronisation is still going, and I am off home soon. I will check it again in the morning.
In the mean time I have been learning a little Perl with the oh so popular emacs(23) editor. It appears to be pretty similar syntax wise to PHP right now, but hopefully I will discover some more advanced features soon.
11.30am. So it looks like the lack of space on the Windows partition on our print server has had a negative effect on our Internet payments system that is running on the same box. I am in the process of re-running the database backups now after a stuck process managed to stop the whole system.
All in a days work as a sysadmin I guess :-)
A quick glance at the primary print server shows me the problem; there was no space left on the Windows partition, a quick clear up, and restarting the print spooler fixes that problem. Note to self: need to get a good networking monitoring software application installed to catch these problems before they arise.
Walking out of the office I am stopped by the Reception staff who tell me the phone answer phone system has stopped working; quick reboot of the telephony server and we're back in action. Not a lot I can do about this, the software is written by a small company who consider "reboot the server" as the fix for all their software's problems.
Now its time to check the weekly backup ran and tackle the