Nagios Exercises
PART I
-----------------------------------------------------------------------------
1. Install Nagios version 3
Do this as root.
# apt-get install nagios3
2. Create the Web user password file:
# htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin
New password:
Re-type new password:
Please use the class password.
2. You should already have a working Nagios!
- Open a browser, and go to
http://localhost/nagios3/
- At the login prompt, login as:
user: nagiosadmin
pass:
3. Let's look at the interface together...
# cd /etc/nagios3/
# ls -l
-rw-r--r-- 1 root root 1882 2008-12-18 13:42 apache2.conf
-rw-r--r-- 1 root root 10524 2008-12-18 13:44 cgi.cfg
-rw-r--r-- 1 root root 2429 2008-12-18 13:44 commands.cfg
drwxr-xr-x 2 root root 4096 2009-02-14 12:33 conf.d
-rw-r--r-- 1 root root 26 2009-02-14 12:36 htpasswd.users
-rw-r--r-- 1 root root 42539 2008-12-18 13:44 nagios.cfg
-rw-r----- 1 root nagios 1293 2008-12-18 13:42 resource.cfg
drwxr-xr-x 2 root root 4096 2009-02-14 12:32 stylesheets
# ls -l conf.d/
-rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
-rw-r--r-- 1 root root 418 2008-12-18 13:42 extinfo_nagios2.cfg
-rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
-rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
-rw-r--r-- 1 root root 210 2009-02-14 12:33 host-gateway_nagios3.cfg
-rw-r--r-- 1 root root 976 2008-12-18 13:42 hostgroups_nagios2.cfg
-rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
-rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
-rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
Notice that the package does not have renamed filenames for the conf.d
directory - they are the same files as used for the Nagios version 2
Ubuntu package. There was an update made to the host-gateway configuration
file so this has been renamed.
PART II
-----------------------------------------------------------------------------
1. According to what we saw in class, let's add a new host
- Pick any PC in the room, i.e. something other than pc10!
# cd /etc/nagios3/conf.d/
# vi pc10.cfg
define host {
use generic-host
host_name pc10
alias PC 10 at APRICOT2009
address _______________ [pc10's IP address here]
}
... Save and quit
2. Let's create a new hostgroup for the occasion, and add our host
to it
- Edit the file hostgroups_nagios2.cfg and add a new group:
# vi hostgroups_nagios2.cfg
define hostgroup {
hostgroup_name servers
alias APRICOT PCs
members pc10
}
3. Now let's associate some services to that host
# vi services_nagios2.cfg
- Find the section called "check that ssh services are running",
and change the line:
hostgroup_name ssh-servers
to
hostgroup_name ssh-servers, servers
4. Verify that your configuration file is OK:
# nagios3 -v /etc/nagios3/nagios.cfg
... You should get :
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the check.
5. Reload/Restart Nagios
# /etc/init.d/nagios3 restart
NOTES:
- If you use Version 2 of Nagios in Ubuntu and you have installed
the Ubuntu package (apt-get install nagios2), then There is a bug in
the Ubuntu init script (/etc/init.d/nagios2).
You should do the following instead:
# /etc/init.d/nagios2 stop
# /etc/init.d/nagios2 start
Each time you make changes - otherwise you will end up with
multiple Nagios instansces running. To resolve this problem
you can do:
# ps auxwww | grep nagios
# killall nagios2
# /etc/init.d/nagios2 start
This bug appears to have been fixed in with the Nagios version 3 install
in Ubuntu Server 8.10.
6. Go to the web interface (http://localhost/nagios3) and check the host
you just added
7. Add ALL the PCs in the room!
- Add all the PCs in the room to the config
- Check HTTP for all PCs in the room
- Remember to verify the configuration file!
- I suggest that you create a single config file called pcs.cfg
to do this.
NOTE:
- This requires a bit of planning, but you should have all the elements
for doing this...
- Think well about the logical structure of the files -- it should be
possible for you to do this without doing too much work!
PART III
-----------------------------------------------------------------------------
1. Now let's create a complete Nagios configuration for our
classroom network.
NOTES:
- This requires more planning. You have switches, routers, and
the NOC (if you wish to add it). In addition, the IP addresses
that you use are for your network router, the classroom router,
and the other network's router depend on your position in the
network.
- You want to use internal IP address for your network's router,
and the gateway router.
- Note that the switches are not running Telnet, they are
using ssh. So you should do either an ssh check on them or
a standard ping check (the Nagios default).
- It is important that you properly define the parent for
devices. Some examples are given below. Devices can have
more than one parent, and in our classroom this is true. The
two switches lan1-lan2-sw and lan3-lan4-sw have two parents
since they have a single administrative interface, but they
are connected by two routers each.
3.) Create a file to define the configuration for your routers.
Maybe "/etc/nagios3/conf.d/routers.cfg". there should be
six entries in this file.
Sample entry:
define host {
use generic-host
host_name lan1-gw
alias router for 140.0/28 net (pc1-pc4)
address 169.223.140.14
parents mgmt-sw
}
4.) Create a file to define the configuration for your switches.
Maybe "/etc/nagios3/conf.d/switches.cfg". There should be
three entries in this file.
Sample entry:
define host {
use generic-host
host_name lan1-lan2-sw
alias Switch for lan1-gw and lan2-gw routers
address 169.223.140.210
parents lan1-gw, lan2-gw
}
5.) In the file "/etc/nagios3/conf.d/hostgroups_nagios2.cfg"
create hostgroups for all the routers, switches and
pcs in the classroom.
Sample entry:
# hostgroup definition for APRICOT 2009 Network Management Workshop
define hostgroup {
hostgroup_name routers
alias Cisco Routers at APRICOT 2009
members lan1-gw,lan2-gw,lan3-gw,lan4-gw,lan5-gw,mgmt-gw
}
6.) In the file "/etc/nagios3/conf.d/services_nagios2.cfg" you
define what groups (not individual devices) will have what
service checks run on them.
Sample entry:
# check that ping-only hosts are up
define service {
hostgroup_name routers,switches,servers
service_description PING
check_command check_ping!100.0,20%!500.0,60%
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
7.) The file "/etc/nagios2/conf.d/extinfo_nagios2.cfg" defines
details for each device defined. For instance, here are
some sample entries you could use to build prettier Nagios
results for your various devices:
================ extinfo_nagios2.cfg ===================
define hostextinfo {
hostgroup_name servers
icon_image base/ubuntu.png
icon_image_alt Debian GNU/Linux
vrml_image ubuntu.png
statusmap_image base/ubuntu.gd2
notes_url http://noc.mgmt.conference.apricot.net/trac/netmanage/wiki/servers
}
define hostextinfo {
hostgroup_name routers
icon_image cook/router.png
icon_image_alt Cisco Routers (2811)
vrml_image router.png
statusmap_image cook/router.gd2
notes_url http://noc.mgmt.conference.apricot.net/trac/netmanage/wiki/routers
}
define hostextinfo {
hostgroup_name switches
icon_image cook/switch.png
icon_image_alt Cisco Switches
vrml_image switch.png
statusmap_image cook/switch.gd2
}
================ extinfo_nagios2.cfg ===================
NOTES:
- You don't have the "ubuntu.*" icons by default. If
you get an error about this when restarting Nagios,
then change "ubuntu.*" to be "linux.*".
- We have additional images available for you to use.
You can download these from the Nagios Plugins and
Add Ons Exchnage site at:
http://www.nagiosexchange.org/
- To get the Ubuntu icons for nagios you can do the following:
# cd /tmp
# wget http://noc.mgmt.conference.apricot.net/software/imagepack-ubuntu.tar
# tar xvf imagepack-ubuntu.tar
# cd logos
# sudo mv * /usr/share/nagios/htdocs/images/logos/base/.
Now you will have the ubuntu logos available to use in Nagio.
8. If you have gotten here and are still reading you can download
an entire set of Nagios configuration files for this network
that will only need a few changes for your machine. These are
availabe here:
http://noc/configs/etc/nagios3/
You can copy the files using wget or scp. For instance:
$ cd /etc/nagios3
$ su -
# scp -r inst@noc:/var/www/share/conf/etc/nagios3/* .
would overwrite whatever you have in your /etc/nagios3
directory and sub-directories with these preconfigured files.
9.) You sill need to update a few files. Including:
/etc/nagios3/conf.d/routers.cfg
/etc/nagios3/conf.d/pcs.cfg
You should make sure that you have the correct IP
addresses defined in routers.cfg for your network view,
and you will want to comment out your pcs entry in
the file pcs.cfg
You may have to make additional changes and to troubleshoot
this using the "Nagios pre-flight check":
# nagios3 -v /etc/nagios3/nagios.cfg
Remember to restart Nagios for changes to take affect.
PART IV
-----------------------------------------------------------------------------
1.) Here we will tie in the ability of Nagios and Trac to work
together to help document your network. The concept if
quite simple. First, go to your local Trac project install
page at:
http://localhost/trac/netmanage
Log in as the admin user so that you can edit the Trac
wiki.
2.) Create an entry for your PC in the wiki. You can do this by
clicking on the "Edit this page" button and entering in a
link like this (example for PC1, use your PC number instead):
[wiki:PC1 PC1] : '''169.223.140.1'''
Save the page.
Alternately, have a look at the main classroom wiki to see
what has been done:
http://noc.mgmt.conference.apricot.net/
3.) Click on the PC1 item that's grey with a question mark. Now
create this page. Enter in some text about your PC and save
the page.
4.) In Nagios you need to edit the file:
/etc/nagios3/conf.d/extinfo_nagios2.cfg
and update your PCs entry in this file with a line like this:
notes_url http://localhost/trac/netmanage/wiki/PC1
You can place this on a line after the "host_name" entry.
Remember to change "PC1" to your PCs number.
5.) Restart Nagios.
6.) If you look in your Nagios Service Detail view there should now be
a new icon next to your machine's entry. This looks like a folder.
Click on this and the URL you entered for the notes_url entry in
the extinfo_nagios2.cfg file will open. You can, also, click on
the machines' icon in the graph views, then click again and this
page will open.
PART V (OPTIONAL)
-----------------------------------------------------------------------------
1.) Now we will create a plug-in for Nagios. This plug-in will do the
following:
* Ping a set of (external) servers.
* If one server is down a warning will be generated.
* If two servers are down a critical state will be generated.
This will be part of our scripting session. The instructions for
doing this are here:
http://ws.edu.isoc.org/workshops/2008/ait-net-manage/presos/scripting/bash.html
These were written for Nagios version 2, but are fine for version 3. Just
replace occurrences of "/etc/nagios2" with "/etc/nagios3".
PART VI
-----------------------------------------------------------------------------
1.) We will update our Nagios contacts definion,
"/etc/nagios3/conf.d/contacts_nagios3.cfg" to add a local user to
that will receive alerts for certain condition.
2.) Next we will add another user for our RT ticketing system so
that a ticket is automatically generated for specific events.
3.) Edit the file "/etc/nagios3/contacts_nagios3.cfg":
# vi /etc/nagios3/contacts_nagios3.cfg
In a web browser open up the sample contacts_nagios3.cfg file
and adapt this to work with what you have. Basically, just
replace yours with this one.
Go to:
http://noc.mgmt.conference.apricot.net/configs/etc/nagios3/conf.d/ \
contacts_nagios3.cfg
4.) Once the files is updated you might have noticed the two lines that read:
service_notification_commands notify-service-ticket-by-email
host_notification_commands notify-host-ticket-by-email
The "notify-service-ticket-by-email" and "notify-host-ticket-by-email"
commands are new. You need to create these in the file
/etc/nagios3/commands.cfg.
This is not strictly necessary. For purposes of this exercise you can
replace these two commands with:
service_notification_commands notify-service-by-email
host_notification_commands notify-ticket-by-email
and skip skip part "4a" if you wish.
4a) These two commands are set aside so that if you wish you can adjust the
formatting of the email that Nagios sends to be more user friendly to
the RT ticketing system. This is up to you. To create these two commands
we simply copy the original commands and renamve them in
/etc/nagios3/commands.cfg.
The easiest way to see this is to open a web browser and go to:
http://noc.mgmt.conference.apricot.net/configs/etc/nagios3/commands.cfg
and then you can copy and past the new items in to your commands.cfg file
on your machine. Note that you could change the names of these if you wish
as long as you match the new name to what is in the
/etc/nagios3/contacts_nagios3.cfg file.
5.) Once you have updated your contacts_nagios3.cfg file, then run the
Nagios pre-flight check:
# nagios3 -v /etc/nagios3/nagios.cfg
If it all looks good, then restart Nagios:
# /etc/init.d/nagios3 restart
Or, less intrusive is:
# /etc/init.d/nagios3 reload
6.) Now we need to create a proper alias in our /etc/aliases file using
the rt-mailgate program to pipe email from Nagios to RT and to the
correct queue.
Edit the file /etc/aliases:
# vi /etc/aliases
And add the following lines to the bottom of the file:
alerts: "|/usr/bin/rt-mailgate --queue 'Network Management' --action correspond --url http://localhost/rt"
alerts-comment: "|/usr/bin/rt-mailgate --queue 'Network Management' --action comment --url http://localhost/rt"
Make note in the file and verify that there is a line that, also, reads:
root: netmanage
This tells the mail system to deliver all mail sent to root@localhost to the
netmanage account instead.
Save the file and quit. In reality we'll only be using the "alerts" alias
at this time.
After you've saved and exited from the /etc/aliases file run:
# newaliases
which lets the Postfix MTA know about changes to /etc/aliases. If you run
in to any problems with errors about rt-mailgate, verify that it
is installed by doing:
# apt-get install rt3.6-clients
this should have been done when you first installed RT.
7.) Now you should go to your RT instance installed on your machine.
http://localhost/rt
log in as "root".
Click on the "Configuration" link, "Queues", "New queue": Be sure that you
fill in the "Queue Name" field with "Network Management" - including the
upper-case 'N' and 'M'.
You only need to fill in Queue Name and Description. Click the "Save Changes"
button on the lower right of the screen.
Now click the "User Rights" link. You'll see that the 'root' user has no
rights on this queue. Give your 'root' user enough rights on this queue to
at least see tickets in the queue and see the queue itself. If you want you
can be lazy and highlight all the rights and assign 'root' everything. You have
to press "Modify User Rights" to do this.
At this point log out of RT and log back in. You should see the Network
Management queue listed on the right of the page.
Now you need to generate a Nagios alert so that a ticket is created in RT. If
you noticed in the /etc/nagios3/conf.d/contacts_nagios3.cfg the Nagios "alerts"
queue only sends notifications if a service is in the "c" or "critical" state,
or if a host is "d" or "down". In addition in the file
/etc/nagios3/conf.d/generic-service_nagios2.cfg there is a line that reads:
notification_interval 0
This ensures that Nagios will only send one (1) email per critical or down
state. If this is set to something else, then you will generate multiple
tickets, which is not good.
Try to generate an alert from Nagios, which should generate a ticket in RT
by doing something. You could check for a service on your neighbor's PC that
does not exist. You could pull the network cable on your neighbor's PC so that
it appears to be down. Otherwise, your instructor will come up with something
as well.
Last update 22 February 2009 by HA