Monitoring Netezza database
with Nagios
Frank Pantaleo
[email protected]
Introduction & Agenda
A couple of Ws
State of monitoring Netezza
Monitoring Netezza with Nagios
Future direction
A couple of Ws - Why
Why are we monitoring Netezza ?
How much $ does your business lose when IT is down ?
7 million each year from IT downtime
Gartner (2005) pegs the hourly cost of downtime for computer
networks at $42,000
A data center outage by itself can cost an average of $5,600 per
minute
Outages damage their reputation
Now take this and bring it to a Cloud level - For every hour it is
not up and running, Amazon.com takes a hit of almost $5 million
Allows you to be more proactive
Allow upper management to plan for DB growth
(includes secondary effects e.g. DR, tape, disk for
backup)
A Couple of Ws - What
What are we looking for in a monitor ?
Universal monitoring
Efficient Alert Notifications (also allows your IT staff
to tell each other when something is being worked
on)
Web Dashboard (one stop shopping!)
Issue Escalation (separate lists for warning, high)
Distributed Monitoring and Scalability (high
availability)
A couple of Ws - What
What are we looking for in a monitor ? (cont)
Reporting (how many times was this service down ?)
External Application Integration (Can I enable my
current applications to allow for early issue
notification)
Open source solution
State of Netezza monitoring
Monitoring systems available for
Netezza
Netezza event monitor comes stock with tool
Netezza portal comes stock with tool
Commercial offerings Brightlight Consulting Observation
Deck
State of Netezza monitoring
Netezza comes with 34 alerts
Alerts actions have limited responses
Email
Script execution
In Version 7.1 can auto create support ticket
Configuration can be done through NPS client or command line
interface on Netezza server
State of Netezza monitoring
Examples of Netezza 7.1 stock sample
alerts
Disk Full
SPU Full
Hardware Failed
Hardware needs attention
Hardware restarted
Hardware service requested
Heat threshold exceeded
History capture event
History load event
HwvoltageFaultAuto
NPSNoLongerOnline
RegenFault
RunAwayQuery
No custom events allowed
State of Netezza monitoring
Netezza Portal
Face on glass monitoring
Custom queries can be added to the monitor
All queries can be seen as numeric or graphic
No alerting
Tool can also be used for maintaining database
objects, users, events, and sessions
If you are using LDAP, portal cant take advantage of
it. Once you login to portal though you will be using
your DB username/password
Netezza monitoring using Nagios
What are we monitoring in Netezza ?
Table Locks by non-EDW statements during EDW
batch cycle
User queries exceeding 1 hour (90% time poorly
formed queries)
User queries during EDW batch cycle (depends on
SLA)
Age of backup older than SLA
LDAP server available for SSO
Netezza monitoring using Nagios
What are we monitoring in Netezza ? (cont)
SPU space unbalanced (generally a side effect of
poor distribution)
State of EDW e.g. loading files, file processing
complete
Late arrival of files preventing the EDW from meeting
SLAs
Netezza monitoring using Nagios
Architecture options with Nagios
Sensors live on Nagios monitoring server
Sensors live on Database server and are controlled
by NRPE. This is what we went with based on
customer security rules.
Scripting language is Perl. Really could be any
language that allows ability to query the database
and deal with responses. There are other options
such as Bash, Java, Python, and C.
Netezza monitoring using Nagios
Architecture options with Nagios (cont)
Active NRPE is a intermediary for running scripts
and bringing results back to Nagios.
Passive SNMP is an option but current provided
alerts need to be tied into a SNMP agent that reports
status. Netezza doesnt raise SNMP alerts OOB.
Netezza monitoring using Nagios
Passive alerts require snmp trap software
Nagios server must be enabled to receive
alerts
http://hyper-choi.blogspot.com/2012/12/nagios-snmp-tr
ap-part-1-snmptt.html
http://hyper-choi.blogspot.com/2013/01/nagios-snmp-tra
p-part-2-configuration.html
Once Nagios is enabled Netezza events must
be changed to make Nagios aware there is a
issue
http://netezzaadmin.wordpress.com/2011/10/07/usingnetezzas-event-manager-to-generate-snmp-traps
Netezza monitoring using Nagios
Passive alerts architecture
Netezza monitoring using Nagios
Active alerts require NRPE to be installed
Checking is done using shell script and Perl
Perl DBI ODBC
Downside is you have to have a exposed user/password. In
this case it was against IT policy so I stopped using this
option.
If we use this though all agents could live on Nagios server
Perl supplied package from Netezza
Downside is this is equivalent of admin so you can do
anything
Upside is no username/password configuration
Agents must live on Database server
Netezza monitoring using Nagios
Active Alert architecture
Netezza monitoring using Nagios
Active Alert agent writing (interface
requirements)
MUST set a return code e.g.
#
#
#
#
0
1
2
3
OK
WARNING
CRITICAL
UNKNOWN
Nagios dashboard displays associated text
if (some logic here )
print "Ok\n";
else
print "Error please look at tablexyz\n";
Netezza monitoring using Nagios
Active alerts - NRPE configuration on Netezza server
If using the Perl package commands must run as nz
user so /etc/nagios/nrpe.cfg must use the following
nrpe_user=nz
nrpe_group=nz
Once a sensor (perl script) is written and tested it
must be added to nrpe.cfg file.
command[check_nz_longqry]=/export/home/nz/scri
pts/check_nz_longqry.pl
Best practice - Request /etc/nagios/nrpe.cfg be
open to read/write from nz user
Netezza monitoring using Nagios
Active alerts - How does NRPE work on Nagios
server ?
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 300
}
define service{
use
generic-service
host_name
proddb
service_description
NZSQL Long query
check_command
check_nrpe!check_nz_longqry!
notifications_enabled
0
}
Netezza monitoring using Nagios
Active Alerts - Perl programming using SQL.pm
package
Invocation
use lib "/nz/kit/share/perl";
use nz::SQL;
Package can only be used by the nz owner
NO username & password
my ($KITDIR, $DATADIR);
$DATADIR = "/nz/data.1.0";
$KITDIR = "/nz/kit";
nz::SQL::config(KITDIR => $KITDIR, DATADIR => $DATADIR);
Best practice - use alarm timers around SQL statements
Handy variables after each SQL execution $qresp->{nrows},
ncols, colid, qtype;
Netezza monitoring using Nagios
Perl programming using SQL.pm package
(continued)
Interface example nz::SQL::query($dbname, $sql). Unlike DBI
the database must be called out every time you query.
Resultsets are not active in database (unlike DBI) they are in perl
memory
Resultset traversal is done using perl foreach e.g.
foreach my $row (@{$qresp->{data}}) {
($blocker_username,$blocker_sql,$blockee_username,
$blockee_sql) = @$row;
Best practice: If you can avoid dealing with resultset and deal only
with counts
e.g (nrows). Most efficient use especially when dealing with a
Nagios alert check that is going to occur several times a day.
Future direction
Data graphing
Expand areas that we are monitoring for in
Netezza
Integrate into a product offering (Observation
Deck) from Brightlight that collects NZHIST for
customer
Predict when we are going to outgrow our current
processing and database needs
Conclusion
Key takeaways are
Using Nagios can help your company have an extensible
event monitor. Understanding Nagios architecture is
important to a stable and working monitoring setup. Once
you understand architecture setup writing an agent is
trivial. If you can write SQL to detect an event then you
can write an agent.
Other Reading materials or learning devices on this
subject that you would like to share
URLs provided in document have the recipe for how to
setup Nagios, SNMP traps, and Netezza. Please visit those
sites to get that info.
Questions?
Any questions?
Thanks!
Reference
http://www.thegeekstuff.com/2010/08/monitoring-softwarecriteria/
http://exchange.nagios.org/directory/Tutorials/Install-andConfigure-NRPE-in-CentOS-and-Red-Hat/details
http://www01.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.
nz.portal.doc/c_portal_welcome.html
http://www.networkworld.com/article/2329877/infrastructuremanagement/how-to-quantify-downtime.html
The End
Frank Pantaleo
[email protected]