Nagios Exercises




PART I
-----------------------------------------------------------------------------

1. Install Nagios version 3
   
    Do this as root.

    # apt-get install nagios3

2. Create the Web user password file:

    # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin

New password:         
Re-type new password: 

   Please use the class password.


2. You should already have a working Nagios!

    - Open a browser, and go to

    http://localhost/nagios3/

    - At the login prompt, login as:

        user: nagiosadmin
        pass: 

3. Let's look at the interface together...

    # cd /etc/nagios3/

    # ls -l 
    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
    
    # ls -l conf.d/

	 -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
	 -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
	 -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
	 -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
	 -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
	 -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
	 -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
	 -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
	 -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg

    Notice that the package does not have renamed filenames for the conf.d 
    directory - they are the same files as used for the Nagios version 2
    Ubuntu package. There was an update made to the host-gateway configuration
    file so this has been renamed.

PART II
-----------------------------------------------------------------------------

1. According to what we saw in class, let's add a new host

    - Pick any PC in the room, i.e. something other than pc10!

    # cd /etc/nagios3/conf.d/

    # vi pc10.cfg

define host {
    use         generic-host
    host_name   pc10
    alias       PC 10 at APRICOT2009 
    address     _______________       [pc10's IP address here]
}

    ... Save and quit

2. Let's create a new hostgroup for the occasion, and add our host
   to it

    - Edit the file hostgroups_nagios2.cfg and add a new group:

    # vi hostgroups_nagios2.cfg

define hostgroup {
    hostgroup_name  servers
    alias           APRICOT PCs
    members         pc10
}

3. Now let's associate some services to that host

    # vi services_nagios2.cfg

    - Find the section called "check that ssh services are running",
      and change the line:

hostgroup_name                  ssh-servers

    to

hostgroup_name                  ssh-servers, servers



4. Verify that your configuration file is OK:

    # nagios3 -v /etc/nagios3/nagios.cfg 

    ... You should get :

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the check.


5. Reload/Restart Nagios

    # /etc/init.d/nagios3 restart

   NOTES:

   - If you use Version 2 of Nagios in Ubuntu and you have installed
     the Ubuntu package (apt-get install nagios2), then There is a bug in 
     the Ubuntu init script (/etc/init.d/nagios2).

     You should do the following instead:

    # /etc/init.d/nagios2 stop
    # /etc/init.d/nagios2 start

     Each time you make changes - otherwise you will end up with
     multiple Nagios instansces running. To resolve this problem
     you can do:

    # ps auxwww | grep nagios
    # killall nagios2
    # /etc/init.d/nagios2 start

    This bug appears to have been fixed in with the Nagios version 3 install
    in Ubuntu Server 8.10.


6. Go to the web interface (http://localhost/nagios3) and check the host
   you just added


7. Add ALL the PCs in the room!

    - Add all the PCs in the room to the config

    - Check HTTP for all PCs in the room

    - Remember to verify the configuration file!

    - I suggest that you create a single config file called pcs.cfg
      to do this.

    NOTE:

    - This requires a bit of planning, but you should have all the elements
      for doing this...

    - Think well about the logical structure of the files -- it should be
      possible for you to do this without doing too much work!



PART III
-----------------------------------------------------------------------------

1. Now let's create a complete Nagios configuration for our
   classroom network.

   NOTES:

   - This requires more planning. You have switches, routers, and
     the NOC (if you wish to add it). In addition, the IP addresses 
     that you use are for your network router, the classroom router, 
     and the other network's router depend on your position in the 
     network.

   - You want to use internal IP address for your network's router,
     and the gateway router.
 
   - Note that the switches are not running Telnet, they are
     using ssh. So you should do either an ssh check on them or
     a standard ping check (the Nagios default).

   - It is important that you properly define the parent for
     devices. Some examples are given below. Devices can have
     more than one parent, and in our classroom this is true. The
     two switches lan1-lan2-sw and lan3-lan4-sw have two parents
     since they have a single administrative interface, but they
     are connected by two routers each.

3.) Create a file to define the configuration for your routers.
    Maybe "/etc/nagios3/conf.d/routers.cfg". there should be
    six entries in this file.

    Sample entry:

define host {
    use        generic-host
    host_name  lan1-gw
    alias      router for 140.0/28 net (pc1-pc4)
    address    169.223.140.14
    parents    mgmt-sw
}


4.) Create a file to define the configuration for your switches.
    Maybe "/etc/nagios3/conf.d/switches.cfg". There should be
    three entries in this file.

    Sample entry:

define host {
    use        generic-host
    host_name  lan1-lan2-sw
    alias      Switch for lan1-gw and lan2-gw routers
    address    169.223.140.210
    parents    lan1-gw, lan2-gw
}


5.) In the file "/etc/nagios3/conf.d/hostgroups_nagios2.cfg"
    create hostgroups for all the routers, switches and
    pcs in the classroom.

    Sample entry:

# hostgroup definition for APRICOT 2009 Network Management Workshop
define hostgroup {
        hostgroup_name routers
        alias          Cisco Routers at APRICOT 2009
        members        lan1-gw,lan2-gw,lan3-gw,lan4-gw,lan5-gw,mgmt-gw
}


6.) In the file "/etc/nagios3/conf.d/services_nagios2.cfg" you
    define what groups (not individual devices) will have what
    service checks run on them.

    Sample entry:


# check that ping-only hosts are up
define service {
        hostgroup_name                  routers,switches,servers
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}


7.) The file "/etc/nagios2/conf.d/extinfo_nagios2.cfg" defines
    details for each device defined. For instance, here are
    some sample entries you could use to build prettier Nagios
    results for your various devices:

================ extinfo_nagios2.cfg ===================
define hostextinfo {
        hostgroup_name   servers
        icon_image       base/ubuntu.png
        icon_image_alt   Debian GNU/Linux
        vrml_image       ubuntu.png
        statusmap_image  base/ubuntu.gd2
        notes_url        http://noc.mgmt.conference.apricot.net/trac/netmanage/wiki/servers
}

define hostextinfo {
        hostgroup_name   routers
        icon_image       cook/router.png
        icon_image_alt   Cisco Routers (2811)
        vrml_image       router.png
        statusmap_image  cook/router.gd2
        notes_url        http://noc.mgmt.conference.apricot.net/trac/netmanage/wiki/routers
}

define hostextinfo {
        hostgroup_name   switches
        icon_image       cook/switch.png
        icon_image_alt   Cisco Switches
        vrml_image       switch.png
        statusmap_image  cook/switch.gd2
}
================ extinfo_nagios2.cfg ===================

    NOTES:

    - You don't have the "ubuntu.*" icons by default. If 
      you get an error about this when restarting Nagios,
      then change "ubuntu.*" to be "linux.*".
    - We have additional images available for you to use.
      You can download these from the Nagios Plugins and
      Add Ons Exchnage site at:

      http://www.nagiosexchange.org/

    - To get the Ubuntu icons for nagios you can do the following:

    # cd /tmp
    # wget http://noc.mgmt.conference.apricot.net/software/imagepack-ubuntu.tar
    # tar xvf imagepack-ubuntu.tar
    # cd logos
    # sudo mv * /usr/share/nagios/htdocs/images/logos/base/.


   Now you will have the ubuntu logos available to use in Nagio.


8. If you have gotten here and are still reading you can download
   an entire set of Nagios configuration files for this network
   that will only need a few changes for your machine. These are
   availabe here:

     http://noc/configs/etc/nagios3/

   You can copy the files using wget or scp. For instance:

     $ cd /etc/nagios3
     $ su -
     # scp -r inst@noc:/var/www/share/conf/etc/nagios3/* .

   would overwrite whatever you have in your /etc/nagios3
   directory and sub-directories with these preconfigured files.

9.) You sill need to update a few files. Including:

     /etc/nagios3/conf.d/routers.cfg
     /etc/nagios3/conf.d/pcs.cfg

    You should make sure that you have the correct IP
    addresses defined in routers.cfg for your network view,
    and you will want to comment out your pcs entry in
    the file pcs.cfg

    You may have to make additional changes and to troubleshoot
    this using the "Nagios pre-flight check":

    # nagios3 -v /etc/nagios3/nagios.cfg

    Remember to restart Nagios for changes to take affect.



PART IV
-----------------------------------------------------------------------------

1.) Here we will tie in the ability of Nagios and Trac to work
    together to help document your network. The concept if
    quite simple. First, go to your local Trac project install
    page at:

    http://localhost/trac/netmanage

    Log in as the admin user so that you can edit the Trac
    wiki.

2.) Create an entry for your PC in the wiki. You can do this by
    clicking on the "Edit this page" button and entering in a
    link like this (example for PC1, use your PC number instead):

    [wiki:PC1 PC1] : '''169.223.140.1'''

    Save the page.

    Alternately, have a look at the main classroom wiki to see
    what has been done:

    http://noc.mgmt.conference.apricot.net/


3.) Click on the PC1 item that's grey with a question mark. Now
    create this page. Enter in some text about your PC and save
    the page.

4.) In Nagios you need to edit the file:

    /etc/nagios3/conf.d/extinfo_nagios2.cfg

   and update your PCs entry in this file with a line like this:

   notes_url       http://localhost/trac/netmanage/wiki/PC1

   You can place this on a line after the "host_name" entry.
   Remember to change "PC1" to your PCs number.

5.) Restart Nagios. 

6.) If you look in your Nagios Service Detail view there should now be
    a new icon next to your machine's entry. This looks like a folder.
    Click on this and the URL you entered for the notes_url entry in
    the extinfo_nagios2.cfg file will open. You can, also, click on
    the machines' icon in the graph views, then click again and this
    page will open.



PART V (OPTIONAL)
-----------------------------------------------------------------------------

1.) Now we will create a plug-in for Nagios. This plug-in will do the
    following:

    * Ping a set of (external) servers.
    * If one server is down a warning will be generated.
    * If two servers are down a critical state will be generated.

    This will be part of our scripting session. The instructions for
    doing this are here:

    http://ws.edu.isoc.org/workshops/2008/ait-net-manage/presos/scripting/bash.html

    These were written for Nagios version 2, but are fine for version 3. Just
    replace occurrences of "/etc/nagios2" with "/etc/nagios3".
    


PART VI
-----------------------------------------------------------------------------

1.) We will update our Nagios contacts definion,
    "/etc/nagios3/conf.d/contacts_nagios3.cfg" to add a local user to
    that will receive alerts for certain condition.

2.) Next we will add another user for our RT ticketing system so
    that a ticket is automatically generated for specific events.


3.) Edit the file "/etc/nagios3/contacts_nagios3.cfg":

    # vi /etc/nagios3/contacts_nagios3.cfg

    In a web browser open up the sample contacts_nagios3.cfg file
    and adapt this to work with what you have. Basically, just 
    replace yours with this one.

    Go to:

    http://noc.mgmt.conference.apricot.net/configs/etc/nagios3/conf.d/ \  
           contacts_nagios3.cfg

4.) Once the files is updated you might have noticed the two lines that read:

        service_notification_commands   notify-service-ticket-by-email
        host_notification_commands      notify-host-ticket-by-email

    The "notify-service-ticket-by-email" and "notify-host-ticket-by-email"
    commands are new. You need to create these in the file
    /etc/nagios3/commands.cfg.

    This is not strictly necessary. For purposes of this exercise you can 
    replace these two commands with:

        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-ticket-by-email

    and skip skip part "4a" if you wish.

4a) These two commands are set aside so that if you wish you can adjust the
    formatting of the email that Nagios sends to be more user friendly to 
    the RT ticketing system. This is up to you. To create these two commands
    we simply copy the original commands and renamve them in 
    /etc/nagios3/commands.cfg.

    The easiest way to see this is to open a web browser and go to:

    http://noc.mgmt.conference.apricot.net/configs/etc/nagios3/commands.cfg

    and then you can copy and past the new items in to your commands.cfg file
    on your machine. Note that you could change the names of these if you wish
    as long as you match the new name to what is in the 
    /etc/nagios3/contacts_nagios3.cfg file.

5.) Once you have updated your contacts_nagios3.cfg file, then run the 
    Nagios pre-flight check:

    # nagios3 -v /etc/nagios3/nagios.cfg

    If it all looks good, then restart Nagios:

    # /etc/init.d/nagios3 restart

    Or, less intrusive is:

    # /etc/init.d/nagios3 reload

6.) Now we need to create a proper alias in our /etc/aliases file using
    the rt-mailgate program to pipe email from Nagios to RT and to the 
    correct queue. 

    Edit the file /etc/aliases:

    # vi /etc/aliases

    And add the following lines to the bottom of the file:

    alerts:         "|/usr/bin/rt-mailgate --queue 'Network Management' --action correspond --url http://localhost/rt"
    alerts-comment: "|/usr/bin/rt-mailgate --queue 'Network Management' --action comment --url http://localhost/rt"

    Make note in the file and verify that there is a line that, also, reads:

    root: netmanage

    This tells the mail system to deliver all mail sent to root@localhost to the
    netmanage account instead.    

    Save the file and quit. In reality we'll only be using the "alerts" alias 
    at this time.

    After you've saved and exited from the /etc/aliases file run:

    # newaliases

    which lets the Postfix MTA know about changes to /etc/aliases. If you run
    in to any problems with errors about rt-mailgate, verify that it 
    is installed by doing:
 
    # apt-get install rt3.6-clients

    this should have been done when you first installed RT.

7.) Now you should go to your RT instance installed on your machine.

    http://localhost/rt

    log in as "root".

    Click on the "Configuration" link, "Queues", "New queue": Be sure that you
    fill in the "Queue Name" field with "Network Management" - including the 
    upper-case 'N' and 'M'.

    You only need to fill in Queue Name and Description. Click the "Save Changes"
    button on the lower right of the screen. 

    Now click the "User Rights" link. You'll see that the 'root' user has no
    rights on this queue. Give your 'root' user enough rights on this queue to
    at least see tickets in the queue and see the queue itself. If you want you
    can be lazy and highlight all the rights and assign 'root' everything. You have
    to press "Modify User Rights" to do this.

    At this point log out of RT and log back in. You should see the Network 
    Management queue listed on the right of the page.

    Now you need to generate a Nagios alert so that a ticket is created in RT. If
    you noticed in the /etc/nagios3/conf.d/contacts_nagios3.cfg the Nagios "alerts"
    queue only sends notifications if a service is in the "c" or "critical" state,
    or if a host is "d" or "down". In addition in the file 
    /etc/nagios3/conf.d/generic-service_nagios2.cfg there is a line that reads:

    notification_interval           0

    This ensures that Nagios will only send one (1) email per critical or down
    state. If this is set to something else, then you will generate multiple
    tickets, which is not good. 

    Try to generate an alert from Nagios, which should generate a ticket in RT
    by doing something. You could check for a service on your neighbor's PC that
    does not exist. You could pull the network cable on your neighbor's PC so that
    it appears to be down. Otherwise, your instructor will come up with something
    as well.






Last update 22 February 2009 by HA