<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>intuitive engineering &#187; nagios</title>
	<atom:link href="http://dougmunsinger.com/tag/nagios/feed" rel="self" type="application/rss+xml" />
	<link>http://dougmunsinger.com</link>
	<description>doug munsinger</description>
	<lastBuildDate>Mon, 07 Jun 2010 16:44:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Fall through code to a success&#8230;</title>
		<link>http://dougmunsinger.com/2009/12/fall-through-code-to-a-success.html</link>
		<comments>http://dougmunsinger.com/2009/12/fall-through-code-to-a-success.html#comments</comments>
		<pubDate>Thu, 10 Dec 2009 15:58:10 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[shell]]></category>
		<category><![CDATA[bash shell]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[nagios]]></category>

		<guid isPermaLink="false">http://dougmunsinger.com/2009/12/fall-through-code-to-a-success.html</guid>
		<description><![CDATA[I found a piece of code in a nagios alerting script that returns &#8220;success&#8221; matter what is happening with the jboss application it is checking. This script had been perpetuated as a service alerting script for years in the environs I work in, edited and passed on as working. It reads: if [failure code here]; [...]]]></description>
			<content:encoded><![CDATA[<p>I found a piece of code in a nagios alerting script that returns &#8220;success&#8221; matter what is happening with the jboss application it is checking.  </p>
<p>This script had been perpetuated as a service alerting script for years in the environs I work in, edited and passed on as working.  It reads:</p>
<p>if [failure code here]; then<br />
    # return failure to nagios nrpe daemon<br />
else<br />
   # return success to nrpe<br />
fi</p>
<p>The test for a failure failed to match, even when the test was looking for a jboss instance not present on the server.  Because the code falls through to success, everything looks just fine, all of the time.  </p>
<p>The test for failure being incorrect struck me first when I looked through the script.  The more basic flaw in logic hit me after.  I think it would be true in an alerting script you would NEVER drop through a loop to a final success.   The test would be for success, the fall-through to failure.  This would have prevented a false sense of security, and the failure to detect success would have been dealt with immediately.</p>
<p>Something like: </p>
<p>if [success code here]; then<br />
    # return success to nagios nrpe daemon<br />
else<br />
   # return failure nrpe<br />
fi</p>
<p>Obvious.  But an epiphany anyway.</p>
<p>&#8211;doug</p>
]]></content:encoded>
			<wfw:commentRss>http://dougmunsinger.com/2009/12/fall-through-code-to-a-success.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>how to monitor ibm mq from nagios</title>
		<link>http://dougmunsinger.com/2008/10/how-to-monitor-ibm-mq-from-nagios.html</link>
		<comments>http://dougmunsinger.com/2008/10/how-to-monitor-ibm-mq-from-nagios.html#comments</comments>
		<pubDate>Wed, 22 Oct 2008 00:21:45 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[websphere]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[nagios]]></category>

		<guid isPermaLink="false">http://dougmunsinger.com/?p=546</guid>
		<description><![CDATA[&#160; This was one of the search terms that found an article here&#8230; I hadn&#8217;t addressed this directly, but I use Nagios to monitor my company&#8217;s server environment, and specifically implemented that monitoring for IBM Websphere MQ. For MQ, I run nagios monitoring against queue depth and processes. I installed three plugins to run against [...]]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p>This was one of the search terms that found an article here&#8230;  I hadn&#8217;t addressed this directly, but I use Nagios to monitor my company&#8217;s server environment, and specifically implemented that monitoring for IBM Websphere MQ. </p>
<p>For MQ, I run nagios monitoring against queue depth and processes.  I installed three plugins to run against WebSphere.  Of these one was developed for my company&#8217;s needs (qdepth), one was changed slightly (channels) and the last debugged, found not to actually measure accurately, and not resolved (message age).   </p>
<p>Here&#8217;s the nagios console for the websphere MQ server.  &#8220;message age&#8221; in the second qdepth check service title is deceptive &#8211; actually checking qdepth&#8230;</p>
<p>&nbsp;</p>
<p><a href="../../../images/posts/2008/nagios.png" rel="shadowbox[post-546];player=img;" title="websphere MQ nagios server result"><img src="../../../images/posts/2008/nagios_400.png" title="nagios" alt="nagios"/></a></p>
<p>&nbsp;</p>
<p>This is the commands section from the nrpe.cfg file on the WebSphere MQ server. </p>
<p>&nbsp;</p>
<p><code><br />
command[check_mq_channel]=/usr/local/nagios/libexec/check_mq_channel.sh $ARG1$ $ARG2$<br />
command[check_mq_msgage]=/usr/local/nagios/libexec/check_mq_msgage.sh $ARG1$ $ARG2$ $ARG3$ $ARG4$<br />
command[wmq_check_qdepth]=/usr/local/nagios/libexec/wmq_check_qdepth.pl $ARG1$ $ARG2$ $ARG3$<br />
</code></p>
<p>&nbsp;</p>
<p>Of these we only really using qdepth monitoring.   The channels come up triggered, so an inactive state is fine, and the plugin as written only tests for &#8220;running&#8221;.  The message age plugin, as I mentioned,  doesn&#8217;t actually work.  </p>
<p> When I first looked at setting this messaging up and then  monitoring it, I searched for &#8220;nagios monitoring MQ webshere&#8221;  and found several pre-written plugins.  I took each plugin and tested it for usability and for accurate results and for meeting what we needed for monitoring. </p>
<p>The message age plugin, in testing, actually returned a hard-coded result rather than actually testing and returning a valid answer.  I started to fix it, set it aside and haven&#8217;t resolved it.  I don&#8217;t recall the source for the plugin.  Check each piece of code you download from the internet &#8211; it may have gone through extensive development and testing, or it could just as easily have been hacked together in an hour.  Your mileage may seriously vary and I would highly recommend you verify any of this before you bet your job on it.  </p>
<p>Here&#8217;s the qdepth plugin &#8211; I think I wrote or re-wrote this pretty much from scratch, but the original concept for parsing runmcsc came from one of the plugins I downloaded, written by Kyle O&#8217;Donnell &#8211; the channel plugin has his original author credit intact.  This plugin has alerted once to an increasing qdepth, which turned out to be an issue with an SSL certificate. </p>
<p>&nbsp;</p>
<hr/>
<code><br />
#! /bin/perl</code></p>
<p><code>## wmq_check_qdepth.pl<br />
#<br />
# nrpe (nagios) script to check websphere qdepth</code></p>
<p><code># uses runmqsc binary<br />
#<br />
# display queue ('APP.REQUEST')<br />
#      8 : display queue ('APP.REQUEST')<br />
#      AMQ8409: Display Queue details.<br />
#      QUEUE(APP.REQUEST)                 TYPE(QLOCAL)<br />
#      ACCTQ(QMGR)                             ALTDATE(2008-01-22)<br />
#      ALTTIME(14.18.23)                       BOQNAME( )<br />
#      BOTHRESH(0)                             CLUSNL( )<br />
#      CLUSTER( )                              CLWLPRTY(0)<br />
#      CLWLRANK(0)                             CLWLUSEQ(QMGR)<br />
#      CRDATE(2008-01-22)                      CRTIME(14.18.23)<br />
#      CURDEPTH(0)                             DEFBIND(OPEN)<br />
#      DEFPRTY(0)                              DEFPSIST(NO)<br />
#      DEFSOPT(SHARED)                         DEFTYPE(PREDEFINED)<br />
#      DESCR( )                                DISTL(NO)<br />
#      GET(ENABLED)                            HARDENBO<br />
#      INITQ( )                                IPPROCS(0)<br />
#      MAXDEPTH(5000)                          MAXMSGL(4194304)<br />
#      MONQ(QMGR)                              MSGDLVSQ(PRIORITY)<br />
#      NOTRIGGER                               NPMCLASS(NORMAL)<br />
#      OPPROCS(0)                              PROCESS( )<br />
#      PUT(ENABLED)                            QDEPTHHI(80)<br />
#      QDEPTHLO(20)                            QDPHIEV(DISABLED)<br />
#      QDPLOEV(DISABLED)                       QDPMAXEV(ENABLED)<br />
#      QSVCIEV(NONE)                           QSVCINT(999999999)<br />
#      RETINTVL(999999999)                     SCOPE(QMGR)<br />
#      SHARE                                   STATQ(QMGR)<br />
#      TRIGDATA( )                             TRIGDPTH(1)<br />
#      TRIGMPRI(0)                             TRIGTYPE(FIRST)<br />
#      USAGE(NORMAL)</code></p>
<p><code>###  Variables  ###</code></p>
<p><code># test values set if this flag is true (1)<br />
### THIS MUST BE SET TO 0 IN PRODUCTION!!! ###<br />
my $test = 0;</code></p>
<p><code># debug flag (adds messages)<br />
my $debug = 0;<br />
my $LOG = "/tmp/wmq_check_qdepth.pl.log";</code></p>
<p><code># runmqsc binary<br />
my $MQSC = "/opt/mqm/bin/runmqsc";</code></p>
<p><code>###    ARGS    ###</code></p>
<p><code># first argument is warn level<br />
my $WARN = shift;<br />
# second arg is crtitical level<br />
my $CRIT = shift;</p>
<p># third arg is queue name<br />
my $QUEUE = shift;</code></p>
<p><code># set for dev purposes<br />
if ($test) {<br />
    $WARN = 5;<br />
    $CRIT = 10;<br />
    $QUEUE = "1A33.EVG.REQUEST";<br />
}</code></p>
<p><code># validate<br />
# WARN and CRIT must be greater than 0 and CRIT must be greater than WARN<br />
unless (($WARN > 0) &#038;&#038; ($CRIT > 0)) {<br />
    print ("Command Failed:  WARN and CRIT levels must be greater than 0\n");<br />
    exit 3;<br />
}<br />
unless ($CRIT > $WARN) {<br />
    print ("Command Failed:  CRIT must be greater than WARN\n");<br />
    exit 4;<br />
}</code></p>
<p><code>###    Subs    ###</code></p>
<p><code>###    MAIN    ###</code></p>
<p><code># run query<br />
my $result = `echo "display queue ('${QUEUE}')" | $MQSC | grep CURDEPTH`;<br />
print ("result: $result\n") if $debug;<br />
# parse result<br />
my @lines = split ("\n", $result);  # divide into an array by end of line...<br />
                                    # each element of the array will contain a single line<br />
# set variables<br />
my ($PARAM, $VALUE);</code></p>
<p><code>for my $line (@lines) {<br />
    # each line is one or two elements like "QDPLOEV(DISABLED)                       QDPMAXEV(ENABLED)"<br />
    # divide those...<br />
    my ($first, $discard) = split (' ', $line);<br />
    print ("\$first: $first   \$discard $discard\n") if $debug;<br />
    ($PARAM, $VALUE) = split ('\(', $first);<br />
    $VALUE =~ s/\)//;<br />
    print ("\$PARAM: $PARAM    \$VALUE: $VALUE\n") if $debug;<br />
}</code></p>
<p><code># testing value<br />
$VALUE = 13 if $test;<br />
# check for $WARN and $CRIT levels, exit 0 as OK, 1 as warn or 2 as critical<br />
if ($VALUE == 0) {<br />
    print ("OK:  found qdepth for $QUEUE at 0\n");<br />
    exit 0;<br />
} elsif ($VALUE < $WARN) {<br />
    print ("OK:   found qdepth for $QUEUE at $VALUE\n");<br />
    exit 0;<br />
} elsif (($VALUE >= $WARN) &#038;&#038; ($VALUE < $CRIT)) {<br />
    print ("WARN: qdepth of $QUEUE is at $VALUE:  exceeds WARN thresh of $WARN\n");<br />
    exit 1;<br />
} elsif ($VALUE >= $CRIT) {<br />
    print ("CRITICAL:  qdepth for $QUEUE at $VALUE: exceeds CRITICAL thresh of $CRIT\n");<br />
    exit 2;<br />
}<br />
</code></p>
<hr/>
<p>&nbsp;</p>
<p>This is the channel status plugin &#8211; I may have re-written the original data gathering runmssc string, but the majority of the plugin remained intact&#8230;</p>
<p>&nbsp;</p>
<hr/>
<code><br />
#!/bin/ksh<br />
#<br />
# check queue manager status<br />
#<br />
# Kyle O'Donnell <kyle[dot]odonnell[at]gmail[dot]com><br />
#<br />
#$Id: check_mq_channel,v 1.2 2007/04/04 14:36:02 kodonnel Exp $<br />
#<br />
# debug<br />
DATE=`date`<br />
LOG="/tmp/nrpe_check_mq_channel.sh.log"<br />
echo "" >> $LOG<br />
echo $DATE >> $LOG<br />
echo "" >> $LOG<br />
[ $# -ne 2 ] &#038;&#038; echo "usage: $0 <channel> <queue manager>" &#038;&#038;  exit 3<br />
channel=$1<br />
qmgr=$2<br />
echo "channel: $channel  qmanager: $qmgr" >> $LOG<br />
RUNMQSC="/opt/mqm/bin/runmqsc"<br />
chanstatus=`echo "dis chs(${channel}) status" | ${RUNMQSC} ${qmgr} | grep -i "status(running)"`<br />
echo "channel status result:  $chanstatus" >> $LOG<br />
if echo $chanstatus |grep -i "status(running)" > /dev/null 2>&#038;1; then<br />
        STATE=0<br />
        printf "${channel} on ${qmgr} running"<br />
        echo ""<br />
        echo ""<br />
else<br />
        STATE=2<br />
        printf "${channel} on ${qmgr} not running"<br />
        echo ""<br />
        echo ""<br />
fi<br />
echo "state:  $STATE" >> $LOG<br />
exit $STATE;<br />
</code></p>
<hr/>
<p>&nbsp;</p>
<p> Here&#8217;s the server.cfg file for the Websphere MQ machine on the nagios server:</p>
<p>&nbsp;</p>
<hr/>
<p><code><br />
define service {<br />
        use                             generic-service<br />
        host_name                       mq1<br />
        service_description             Host Alive<br />
        check_period                    24x7<br />
        contact_groups                  unix-administrators<br />
        notification_period             24x7<br />
        check_command                   check-host-alive<br />
}</code></p>
<p><code>define service {<br />
        use                             generic-service<br />
        host_name                       mq1<br />
        service_description             Sonic Bridge java process<br />
        check_period                    24x7<br />
        contact_groups                  esb-administrators<br />
        notification_period             24x7<br />
        check_command                   check_unix_proc!mqm!1!java<br />
}</code></p>
<p><code>define service {<br />
        use                             generic-service<br />
        host_name                      mq1<br />
        service_description             SSB queue depth EVGPQM01.DEAD.QUEUE message age<br />
        check_period                    24x7<br />
        contact_groups                  systems-services,help_desk<br />
        notification_period             24x7<br />
        check_command                   wmq_check_qdepth!1!3!QMGR01!QMGR01.DEAD.QUEUE<br />
}</code></p>
<p><code>define service {<br />
        use                             generic-service<br />
        host_name                       mq1<br />
        service_description             server queue depth APPLICATION.RESPONSE<br />
        check_period                    24x7<br />
        contact_groups                  systems-services,help_desk<br />
        notification_period             24x7<br />
        check_command                   wmq_check_qdepth!5!10!APPLICATION.RESPONSE<br />
}</code></p>
<p><code>define service {<br />
        use                             generic-service<br />
        host_name                       mq1<br />
        service_description             server queue depth OPPOSITE-QMGR<br />
        check_period                    24x7<br />
        contact_groups                  systems-services,help_desk<br />
        notification_period             24x7<br />
        check_command                   wmq_check_qdepth!5!10!OPPOSITE-QMGR<br />
}</code></p>
<p><code>define service {<br />
        use                             generic-service<br />
        host_name                       mq1<br />
        service_description             WMQ command server<br />
        check_period                    24x7<br />
        contact_groups                  systems-services,help_desk<br />
        notification_period             24x7<br />
        check_command                   check_unix_proc!mqm!1!amqpcsea<br />
}</code></p>
<p><code>define service {<br />
        use                             generic-service<br />
        host_name                       mq1<br />
        service_description             WMQ Critical process manager<br />
        check_period                    24x7<br />
        contact_groups                  systems-services,help_desk<br />
        notification_period             24x7<br />
        check_command                   check_unix_proc!mqm!1!amqzmuc0<br />
}</code></p>
<hr/>
<p>&nbsp;</p>
<p>The strategy is to monitor qdepth and processes specific to IBM WebSphere MQ on the Websphere MQ server, along with the normal UNIX processes and disk space.  </p>
<p>&nbsp;</p>
<p>&mdash; dsm</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://dougmunsinger.com/2008/10/how-to-monitor-ibm-mq-from-nagios.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.444 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-07-29 18:38:29 -->
