0% found this document useful (0 votes)
66 views58 pages

Nagios Network Monitoring Overview

This document provides an overview of Nagios, an open source network monitoring software. It discusses key Nagios concepts like checks, plugins, templates, host and service configuration, notifications, and using host and service groups for efficient configuration. Installation on Ubuntu is demonstrated, and the configuration files and directories are outlined. Templates are used to define default settings and inheritance. Hosts, services, contacts and notifications are configured through object definitions in configuration files.

Uploaded by

AlcorNoque
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views58 pages

Nagios Network Monitoring Overview

This document provides an overview of Nagios, an open source network monitoring software. It discusses key Nagios concepts like checks, plugins, templates, host and service configuration, notifications, and using host and service groups for efficient configuration. Installation on Ubuntu is demonstrated, and the configuration files and directories are outlined. Templates are used to define default settings and inheritance. Hosts, services, contacts and notifications are configured through object definitions in configuration files.

Uploaded by

AlcorNoque
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

These materials are licensed under the Creative Commons Attribution-Noncommercial 3.

0 Unported license
(http://creativecommons.org/licenses/by-nc/3.0/)
NAGIOS
Network Management &
Monitoring
Introduction
Network Mon|tor|ng 1oo|s
! AvallablllLy
! 8ellablllLy
! erformance
!"#$%& "()*+,- .%/$0%1& 02+ "*"$,"3$,$0- %4
5+*$(+& "/5 &+1*$(+&
Introduction
" osslbly Lhe mosL used open source neLwork
monlLorlng soware
" Web lnLerface for vlewlng sLaLus, browslng
hlsLory, schedullng downume eLc
" Sends ouL alerLs vla L-mall. Can be congured Lo
use oLher mechanlsms, e.g. SMS
Example: Service Detail view
Features
uullzes Lopology Lo deLermlne dependencles.
- ulerenuaLes beLween whaL ls 5%6/ vs. whaL ls
7/1+"(2"3,+. Avolds runnlng unnecessary checks
and sendlng redundanL alarms
Allows you Lo dene how Lo send noucauons
based on comblnauons of:
- ConLacLs and llsLs of conLacLs
- uevlces and groups of devlces
- Servlces and groups of servlces
- uened hours by persons or groups.
- 1he sLaLe of a servlce.
Plugins
luglns are used Lo verlfy servlces and devlces:
- naglos archlLecLure ls slmple enough LhaL wrlung new
pluglns ls falrly easy ln Lhe language of your cholce.
- 1here are !"#$% !"#$ pluglns avallable (Lhousands).
# hup://exchange.naglos.org/
# hup://naglospluglns.org/
Pre-installed plugins in Ubuntu
]usr]||b]nag|os]p|ug|ns




]etc]nag|os-p|ug|ns]conhg
How checks work
" Periodically Nagios calls a plugin to test the state of each
service. Possible responses are:
- OK
- WARNING
- CRITICAL
- UNKNOWN
" If a service is not OK it goes into a soft error state. After
a number of retries (default 3) it goes into a hard error
state. At that point an alert is sent.
" You can also trigger external event handlers based on
these state transitions
How checks work continued
Parameters
- Normal checking interval
- Retry interval (i.e. when not OK)
- Maximum number of retries
- Time period for performing checks
- Time period for sending notifications
Scheduling
- Nagios spreads its checks throughout the time
period to even out the workload
- Web UI shows when next check is scheduled
The concept of parents
nosts can have parents:
1he parenL of a C connecLed Lo a swlLch would be
Lhe swlLch.
Allows us Lo speclfy Lhe dependencles beLween
devlces.
Avolds sendlng alarms when parenL does noL
respond.
A node can have muluple
parenLs (dual homed).
Network viewpoint
Where you locaLe your naglos server wlll
deLermlne your polnL of vlew of Lhe neLwork.
1he naglos server becomes Lhe rooL" of your
dependency Lree
Network viewpoint
Demo Nagios
Installation
In Debian/Ubuntu
# apt-get install nagios3
Key directories
/etc/nagios3
/etc/nagios3/conf.d
/etc/nagios-plugins/config
/usr/lib/nagios/plugins
/usr/share/nagios3/htdocs/images/logos
Nagios web interface is here:
http://pcN.ws.nsrc.org/nagios3/
Configuration
" Congurauon dened ln LexL les
- /eLc/naglos3/conf.d/*.cfg
- ueLalls aL hup://naglos.sourceforge.neL/docs/3_0/
ob[ecLdenluons.hLml
" 1he defaulL cong ls broken lnLo several les
wlLh dlerenL ob[ecLs ln dlerenL les, buL
acLually you can organlse lL how you llke
" Always verlfy before resLarung naglos -
oLherwlse your monlLorlng sysLem may dle!
- nagios3 v /etc/nagios3/nagios.cfg
Hosts and services configuration
8ased on temp|ates
- 1hls saves loLs of ume avoldlng repeuuon
1here are defau|t temp|ates w|th defau|t
parameters for a:
- #+/+1$( 2%&0 (generlc-hosL_naglos2.cfg)
- #+/+1$( &+1*$(+ (generlc-servlce_naglos2.cfg)
lndlvldual semngs can be overrldden
uefaulLs are all senslble
Monitoring a single host
define host {
host_name pc1
alias pc1 in group 1
address pc1.ws.nsrc.org
use generic-host
}
" 1hls ls a mlnlmal worklng cong
- ?ou are [usL plnglng Lhe hosL, naglos wlll warn LhaL
you are noL monlLorlng any servlces
" 1he lename can be anyLhlng endlng .cfg
" Crganlse your devlces however you llke - e.g.
relaLed hosLs ln Lhe same le
pcs.cfg
copy settings from this template
Generic host template
define host {
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION
; ITS NOT A REAL HOST, JUST A TEMPLATE!
}

gener|c-host_nag|os2.cfg
Overriding defaults
define host {
host_name pc1
alias pc1 in group 1
address pc1.ws.nsrc.org
use generic-host
notification_interval 120
contact_groups admins,managers
}
pcs.cfg
All semngs can be overrldden per hosL
Defining services (direct way)
define host {
host_name pc1
alias pc1 in group 1
address pc1.ws.nsrc.org
use generic-host
}

define service {
host_name pc1
service_description HTTP
check_command check_http
use generic-service
}

define service {
host_name pc1
service_description SSH
check_command check_ssh
use generic-service
}
pcs.cfg
service template
service pc1,HTTP
plugin
Service checks
" 1he comblnauon of hosL + servlce ls a unlque
ldenuer for Lhe servlce check, e.g.
- pc1,P11
- pc1,SSP
- pc2,P11
- pc2,SSP
" (2+(89(%.."/5 polnLs Lo Lhe plugln
" &+1*$(+ 0+.:,"0+ pulls ln semngs for how oen
Lhe check ls done, and who and when Lo alerL
Generic service template
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 0
is_volatile 0
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options w,u,c,r
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION
}
gener|c-serv|ce_nag|os2.cfg*
*Comments have been removed.
Overriding defaults
Agaln, semngs can be overrldden per servlce
define service {
host_name pc1
service_description HTTP
check_command check_http
use generic-service
contact_groups admins,managers
max_check_attempts 3
}
serv|ces_nag|os2.cfg
Repeated service checks
" Cen we are monlLorlng an ldenucal servlce on
many hosLs
" 1o avold dupllcauon, a beuer way ls Lo dene a
servlce check for all hosLs ln a 2%&0#1%7:
Creating hostgroups
define hostgroup {
hostgroup_name http-servers
alias HTTP servers
members pc1,pc2
}

define hostgroup {
hostgroup_name ssh-servers
alias SSH servers
members pc1,pc2
}
hostgroups_nag|os2.cfg
Monitoring services in hostgroups
define service {
hostgroup_name http-servers
service_description HTTP
check_command check_http
use generic-service
}

define service {
hostgroup_name ssh-servers
service_description SSH
check_command check_ssh
use generic-service
}
serv|ces_nag|os2.cfg
e.g. lf hosLgroup hup-servers conLalns pc1 and pc2 Lhen naglos
creaLes P11 servlce checks for boLh hosLs. 1he servlce checks
are called pc1,P11 and pc2,P11
Alternative view
" lnsLead of saylng Lhls hosLgroup conLalns
Lhese Cs you can say Lhls C belongs Lo
Lhese hosLgroups
" no need for Lhe members llne ln hosLgroups
le
Alternative group membership
define host {
host_name pc1
alias pc1 in group 1
address pc1.ws.nsrc.org
use generic-host
hostgroups ssh-servers,http-servers
}

define host {
host_name pc2
alias pc2 in group 1
address pc2.ws.nsrc.org
use generic-host
hostgroups ssh-servers,http-servers
}

pcs.cfg
PosLs and servlces convenlenLly dened ln Lhe same place
Other uses for hostgroups
define host {
host_name pc1
alias pc1 in group 1
address pc1.ws.nsrc.org
use generic-host
hostgroups ssh-servers,http-servers,debian-servers
}
pcs.cfg
Chooslng lcons for Lhe sLaLus map
define hostextinfo {
hostgroup_name debian-servers
notes Debian GNU/Linux servers
icon_image base/debian.png
statusmap_image base/debian.gd2
}
exnnfo_nag|os2.cfg
Optional: servicegroups
define servicegroup {
servicegroup_name mail-services
alias Services comprising the mail platform
members web1,HTTP,web2,HTTP,mail1,IMAP,db1,MYSQL
}
serv|cegroups.cfg
" ?ou can can also group LogeLher servlces lnLo a
servlcegroup
" 1hls ls so relaLed or dependenL servlces can be
vlewed LogeLher ln Lhe web lnLerface
" 1he servlces Lhemselves musL already exlsL
Configuring topology
define host {
host_name pc1
alias pc1 in group 1
address pc1.ws.nsrc.org
use generic-host
parents rtr1
}
" 1hls means pc1 ls on Lhe far slde of rLr1
" lf rLr1 goes down, pc1 ls marked unreachable
raLher Lhan down
" revenLs a cascade of alerLs lf rLr1 goes down
" Also allows naglos Lo draw cool sLaLus map
pcs.cfg
parent host
Another view of configuration
RTR
define host {
use generic-host
host_name rtr
alias Gateway Router
address 10.10.0.254 }
SW
define host {
use generic-host
host_name sw
alias Backbone Switch
address 10.10.0.253
parents rtr }
RTR3
define host {
use generic-host
host_name rtr3
alias router 3
address 10.10.3.254
parents sw }
PC11!
Out-of-Band (OOB) notifications
A crlucal lLem Lo remember: an SMS or message
sysLem LhaL ls lndependenL from your neLwork.
- ?ou can uullze a cell phone connecLed Lo Lhe
naglos server, or a uS8 dongle wlLh SlM card
- ?ou can use packages llke:
gammu: hup://wammu.eu/
gnok||: hup://www.gnokll.org/
sms-too|s: hup://smsLools3.kekekasvl.com/
References
Nagios web site
http://www.nagios.org/
Nagios plugins site
http://www.nagiosplugins.org/
Nagios System and Network Monitoring, by
Wolfgang Barth. Good book about Nagios.
Unofficial Nagios plugin site
http://nagios.exchange.org/
A Debian tutorial on Nagios
http://www.debianhelp.co.uk/nagios.htm
Commercial Nagios support
http://www.nagios.com/
Questions?



;
Additional Details
A few addluonal slldes you may nd useful or
lnformauve.
Features, features, features!
Allows you to acknowledge an event.
- A user can add comments via the GUI
You can define maintenance periods
- By device or a group of devices
Maintains availability statistics and generates reports
Can detect flapping and suppress additional notifications.
Allows for multiple notification methods:
- e-mail, pager, SMS, winpopup, audio, etc...
Allows you to define notification levels for escalation
Notification Options (Host)
Host state:
When configuring a host you can be notified
on the following conditions:
d: DOWN
u: UNREACHABLE
r: RECOVERY
f: FLAPPING (start/end)
s: SCHEDULED DOWNTIME (start/end)
n: NONE
Notification Options (Service)
Service state:
When configuring a service you can be
notified on the following conditions:
w: WARNING
c: CRITICAL
u: UNKNOWN
r: RECOVERY
f: FLAPPING (start/end)
s: SCHEDULED DOWNTIME (start/end)
n: NONE
Configuration files (Official)
Debian/Ubuntu config file layout
Located |n ]etc]nag|os3]
lmporLanL les lnclude:
$ naglos.cfg Maln congurauon le.
$ cgl.cfg ConLrols Lhe web lnLerface and
securlLy opuons.
$ commands.cfg 1he commands LhaL naglos
uses for noucauons.
$ conf.d/* All oLher congurauon goes
here!
Configuration files continued
Under conf.d]*
$ conLacLs_naglos2.cfg users and groups
$ exunfo_naglos2.cfg make your ul preuy
$ generlc-hosL_naglos2.cfg defaulL hosL LemplaLe
$ generlc-servlce_naglos2.cfg defaulL servlce LemplaLe
$ hosL-gaLeway_naglos3.cfg upsLream rouLer denluon
$ hosLgroups_naglos2.cfg groups of nodes
$ localhosL_naglos2.cfg denluon of naglos hosL
$ servlces_naglos2.cfg whaL servlces Lo check
$ umeperlods_naglos2.cfg when Lo check who Lo noufy
Configuration files continued
Under conf.d some other poss|b|e conhg h|es:
$ servlcegroups.cfg Croups of nodes and servlces

$ pcs.cfg Sample denluon of Cs (hosLs)

$ swlLches.cfg uenluons of swlLches (hosLs)

$ rouLers.cfg uenluons of rouLers (hosLs)
Main configuration details
G|oba| semngs
I||e: /etc/nagios3/nagios.cfg
Says where oLher congurauon les are.
Ceneral naglos behavlor:
- lor large lnsLallauons you should Lune Lhe
lnsLallauon vla Lhls le.
- See: <7//$/# !"#$%& 4%1 =">$.7. ?+14%1."/(+
hup:]]nag|os.sourceforge.net]docs]3_0]
tun|ng.htm|
CGI configuration
]etc]nag|os3]cg|.cfg
- ?ou can change Lhe CCl dlrecLory lf you wlsh
- AuLhenucauon and auLhorlzauon for naglos use:
- AcuvaLe auLhenucauon vla Apache's .hLpasswd mechanlsm, or
uslng 8AuluS or LuA.
- users can be asslgned rlghLs vla Lhe followlng varlables:
" auLhorlzed_for_sysLem_lnformauon
" auLhorlzed_for_congurauon_lnformauon
" auLhorlzed_for_sysLem_commands
" auLhorlzed_for_all_servlces
" auLhorlzed_for_all_hosLs
" auLhorlzed_for_all_servlce_commands
" auLhorlzed_for_all_hosL_commands
Time Periods
# '24x7'
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

1hls denes Lhe base perlods LhaL conLrol checks,
noucauons, eLc.
- uefaulLs: 24 x 7
- Could ad[usL as needed, such as work-week only.
- Could ad[usL a new ume perlod for ouLslde of regular
hours, eLc.
define command {
command_name check_ssh
command_line /usr/lib/nagios/plugins/check_ssh '$HOSTADDRESS$
}

define command {
command_name check_ssh_port
command_line /usr/lib/nagios/plugins/check_ssh -p '$ARG1$' '$HOSTADDRESS$
}
Configuring service/host checks
Notice the same plugin can be invoked in different ways (commands)
Command and arguments are separated by exclamation marks (!)
e.g. to check SSH on a non-standard port, you can do it like this:
define service {
hostgroup_name ssh-servers-2222
service_description SSH-2222
check_command check_ssh_port!2222
use generic-service
}
]etc]nag|os-p|ug|ns]conhg]ssh.cfg
this is $ARG1$
Notification commands
Allows you to utilize any command you wish.
We could use this to generate tickets in RT.
# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "Service: $SERVICEDESC$\nHost:
$HOSTNAME$\nIn: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$
\nInfo: $SERVICEOUTPUT$\nDate: $SHORTDATETIME$" | /bin/mail -s
'$NOTIFICATIONTYPE$: $HOSTNAME$/$SERVICEDESC$ is $SERVICESTATE$'
$CONTACTEMAIL$
}

From: [email protected]
To: router_group@localdomain
Subject: Host DOWN alert for TLD1-RTR!
Date: Thu, 29 Jun 2006 15:13:30 -0700


Host: gw
In: Core_Routers
State: DOWN
Address: 192.0.2.100
Date/Time: 06-29-2006 15:13:30
Info: CRITICAL - Plugin timed out after 6 seconds
Group service configuration
# check that ssh services are running
define service {
hostgroup_name ssh-servers
service_description SSH
check_command check_ssh
use generic-service
notification_interval 0
}
The service_description is important if you plan to create Service
Groups. Here is a sample Service Group definition:
define servicegroup{
servicegroup_name Webmail
alias web-mta-storage-auth
members srvr1,HTTP,srvr1,SMTP,srvr1,POP, \
srvr1,IMAP,srvr1,RAID,srvr1,LDAP, \
srvr2,HTTP,srvr2,SMTP,srvr2,POP, \
srvr2,IMAP,srvr2,RAID,srvr2,LDAP
}
Screen Shots
A few sample screen shots from a Nagios
install.
General View
Host Detail
Host Groups Overview
Service Groups Overview
Collapsed tree status map
Marked-up circular status map
More sample screenshots
Many more sample
Nagios screenshots
available here:

http://www.nagios.org/about/
screenshots

You might also like