AfNOG Network Management Tutorial Nagios Installation and Configuration Notes: ------ * Commands preceded with "$" imply that you should execute the command as a general user - not as root. * Commands preceded with "#" imply that you should be working as root. * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") imply that you are executing commands on remote equipment, or within another program. Exercises --------- PART I ------ **** READ! ****** [Exercise 1 has already been done for you. The nagiosadmin password is the same as your "classroom" password. Please ask your instructor if you do not know what this is. Please skip to Exercises 2.] 1. Install Nagios version 3. You can do this as root or as the tldadmin user and use the "sudo" command: # apt-get install nagios3 2. Create the Web user password file: # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin New password: Re-type new password: We suggest you use your standard user password used in class. 2. You should already have a working Nagios! - Open a browser, and go to http://localhost/nagios3/ - At the login prompt, login as: user: nagiosadmin pass: 3. Let's look at the interface together... # cd /etc/nagios3/ # ls -l -rw-r--r-- 1 root root 1882 2008-12-18 13:42 apache2.conf -rw-r--r-- 1 root root 10524 2008-12-18 13:44 cgi.cfg -rw-r--r-- 1 root root 2429 2008-12-18 13:44 commands.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:33 conf.d -rw-r--r-- 1 root root 26 2009-02-14 12:36 htpasswd.users -rw-r--r-- 1 root root 42539 2008-12-18 13:44 nagios.cfg -rw-r----- 1 root nagios 1293 2008-12-18 13:42 resource.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:32 stylesheets # ls -l conf.d/ -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg -rw-r--r-- 1 root root 418 2008-12-18 13:42 extinfo_nagios2.cfg -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg -rw-r--r-- 1 root root 210 2009-02-14 12:33 host-gateway_nagios3.cfg -rw-r--r-- 1 root root 976 2008-12-18 13:42 hostgroups_nagios2.cfg -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg Notice that the package does not have renamed filenames for the conf.d directory - they are the same files as used for the Nagios version 2 Ubuntu package. There was an update made to the host-gateway configuration file so this has been renamed. PART II Configuring Equipment ----------------------------------------------------------------------------- 1. Let's configure Nagios to start monitoring another computer in our classroom: - Pick any PC attached to the same router as your PC. Refer to the class network diagram for help. # cd /etc/nagios3/conf.d/ # vi pcX.cfg (Where X is some number) define host { use generic-host host_name pcX alias PC X Network Management Tutorial address _______________ [pcX's IP address here] } ... Save and quit 2. Let's create a new hostgroup for the occasion, and add our host to it - Edit the file hostgroups_nagios2.cfg and add a new group: # vi hostgroups_nagios2.cfg define hostgroup { hostgroup_name servers alias RouterN PCs members pcX } 3. Now let's associate some services to that host # vi services_nagios2.cfg - Find the section called "check that ssh services are running", and change the line: hostgroup_name ssh-servers to hostgroup_name ssh-servers, servers 4. Verify that your configuration file is OK: # nagios3 -v /etc/nagios3/nagios.cfg ... You should get : Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the check. 5. Reload/Restart Nagios # /etc/init.d/nagios3 restart 6. Go to the web interface (http://localhost/nagios3) and check the host you just added 7. Add ALL the PCs attached to your router. - Remember to verify the configuration file! - I suggest that you create a single config file called pcs.cfg to do this. - You will repeat steps 1, 2 and 3 from above. When you edit the file hostgroups_nagios2.cfg to update the members of the servers group the format of the members statement is: members pcX,pcY,pcZ,... - If you do not know the names of all the PCs attached to your router or their IP addresses refer to the classroom Network Diagram either available in the classroom, or on the class web site: http://noc/ If this URL is not valid ask your instructor where you can see a network diagram of the class. You will need this diagram to fill in information in the remaining exercises. 8. Add the routers and switches in your classroom - Create files called "routers.cfg" and "switches.cfg" in /etc/nagios3/conf.d - In the routers file you need to add entries for each router. Here is a sample initial entry for the gateway router for the classroom. You need to find the host-name and IP address from the classroom network diagram and replace the "xxxx"'s below with the correct information: define host { use generic-host host_name xxxxxxxx alias gw router address xxx.xxx.xxx.xxx } add in entries for the other routers. - There are several switches. Do the same in the switches.cfg file. - Remember to look at the network diagram if you do not know their names or IP addresses. - Use the Nagios "pre-flight" check to verify that your configuration is correct: # nagios3 -v /etc/nagios3/nagios.cfg - You may see some errors as there are no services defined for these new entries. This is OK and we will be taking care of this later. 9. Reload/Restart Nagios # /etc/init.d/nagios3 restart - Take a look at http://localhost/nagios3 to see your changes. - Click on the "Status Map" link to see how things look. PART III Defining Parents ----------------------------------------------------------------------------- 1. Define parents for your hardware devices - Remember that Nagios is smart about what to check based on the state of your network. This "smartness" is largely driven by the concept of parent relationships. Each device in our network (except for the classroom gateway router) has a parent device. You need to define what that device is for each pc, router and switch in the files pcs.cfg, switches.cfg and routers.cfg. - This is extremely simple. To get you started here is an updated entry for pcX who has a parent of switchY in the file pcs.cfg: define host { use generic-host host_name pcX alias PC X Network Management Tutorial address _______________ [pcX's IP address here] parents switchY } - Note, use the hostname, not the IP address for parents entries. - Repeat this process for all the devices you have defined. If you do not know the name of the parent device, or are confused about the network layout for the classroom remember to use the network diagram: - Once you are done be sure to do: # nagios3 -v /etc/nagios3/nagios.cfg to check on the status of your work. 2. Restart Nagios and review the Status Map # /etc/init.d/nagios3 restart - Now click on the Status Map link again. It should look quite different! 3. Create entries for all PCs, switches and routers in the classroom. - Create PC entries for each PC attached to each switch in the classroom. - Create entries for each switch attached to each router. - Create entries for each router attached to the backbone switch for the classroom. - Define an entry for the NOC server attached to the classroom backbone switch. - Define an entry for the backbone switch attached to the classroom gateway router. - You can use the classroom network diagram to do this. If you complete all these entries and have defined each PC, switch and router's parent, then your status map should correctly represent the physical layout of the classroom. PART IV Defining Services ----------------------------------------------------------------------------- 1. Determine what services to define for what devices - This is core to how you use Nagios and network monitoring tools in general. So far we are simply using ping to verify that physical hosts are up on our network. The next step is to decide what services you wish to monitor for each host. - In this particular class we have: routers: running ssh switches: running ssh and, possibly, telnet pcs: All pcs are running ssh and http All student pcs are running an snmp daemon So, let's configure Nagios to check for all of these services for these devices. 2. Check that telnet is running on the workshop switches. If the switches in your workshop are not running telnet you can skip this exercise. - You will need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg to first define the "check_telnet" and to what group of hosts this command will apply. - Edit the file services_nagios2.cfg: # vi /etc/nagios3/conf.d/services_nagios2.cfg At the bottom of the file add in the new service definition. It will look like this: # check that telnet is running define service { hostgroup_name telnet-servers service_description Telnet check_command check_telnet use generic-service notification_interval 0 ; set > 0 if you want to be renotified } - By default Nagios (on Ubuntu) is pre-configured with web, ssh and ping service definition. It turns out, once we are completely done, that you may not need the ping service definition - but, don't remove it yet! - Notice the parameter that says: hostgroup_name telnet-servers We need to create this before we try to restart Nagios. Edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg and at the bottom of the file add the following entry: # A list of your telnet-accessible devices (older switches) define hostgroup { hostgroup_name telnet-servers alias Telnet servers members xxxx,xxxx,xxxx,xxxx } Note the "members" section. Hopefully when you defined your switches in the switches.cfg file this is what you used for the host_name directive for the switches. - Save your changes and check your configuration: # nagios3 -v /etc/nagios3/nagios.cfg - Restart Nagios and see if you notice the changes you've made. Note that the actual check of the telnet service will most likely be in a "pending" state at first. 3.) Verify that SSH is running on the routers and workshop PCs - In the file services_nagios2.cfg there is already an entry for the SSH service check, so you do not need to create this step. Instead, you simply need to re-define the "ssh-servers" entry in the file /etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file look like: # A list of your ssh-accessible servers define hostgroup { hostgroup_name ssh-servers alias SSH servers members localhost } What do you think you should change? Correct, the "members" line. You should remove "localhost" and add in entries for all the classroom pcs, routers and the switches that run ssh. With this information and the network diagram you should be able complete this entry: - Once you are done, run the pre-flight check: # nagios3 -v /etc/nagios3/nagios.cfg If everything looks good, then restart Nagios and see your changes in the Nagios web interface. 4.) Check that http is running on all the workshop PCs. - Like ssh, there is already a check_http service defined and it automatically applies to the http-servers group. (Note, you can add additional groups of hosts for any service check if you wish). So, you need to update the "http-servers" entry in the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg to include all the workshop PCs running http (i.e. Apache Web Server). - See the previous exercise and make the appropriate change to do this. If you have any questions ask your instructor for help. 5.) Check that SNMP is running on the classroom PCs. - First you will need to add in the appropriate service check for SNMP in the file /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There are hundreds, if not thousands, of service checks available via the various Nagios sites on the web. You can see what plugins are installed by Ubuntu in the nagios3 package that we've installed by looking in the following directory: # ls /usr/lib/nagios/plugins As you'll see there is already a check_snmp plugin available to us. If you are interested in the options the plugin takes you can execute the plugin from the command line by typing: # /usr/lib/nagios/plugins/check_snmp to see what options are available, etc. You can use the check_snmp plugin and Nagios to create very complex or specific system checks. - Now to see all the various service/host checks that have been created using the check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will see that there are a lot of preconfigured checks using snmp, including: snmp_load snmp_cpustats snmp_procname snmp_disk snmp_mem snmp_swap snmp_procs snmp_users snmp_mem2 snmp_swap2 snmp_mem3 snmp_swap3 snmp_disk2 snmp_tcpopen snmp_tcpstats snmp_bgpstate check_netapp_uptime check_netapp_cupuload check_netapp_numdisks check_compaq_thermalCondition And, even better, you can create additional service checks quite easily. For the case of verifying that snmpd (the SNMP service on Linux) is running we need to ask SNMP a question. If we don't get an answer, then Nagios can assume that the SNMP service is down on that host. When you use service checks such as check_http, check_ssh and check_telnet this is what they are doing as well. - In our case, let's create a new service check and call it "check_system". This service check will connect with the specified host, use the private community string we have defined in class and ask a question of snmp on that ask - in this case we'll ask about the System Description, or the OID "sysDescr.0" - - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg: # vi /etc/nagios-plugins/config/snmp.cfg At the top (or the bottom, your choice) add the following entry to the file: # «check_system_ command definition define command{ command_name check_system command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C '$ARG1$' -o sysDescr.0 } Note that "command_line" is a single line. - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add in this service check. We'll run this check against all our servers in the classroom, or the hostgroup "debian-servers" - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg # vi /etc/nagios3/conf.d/services_nagios2.cfg At the bottom of the file add the following definition: # check that snmp is up on all servers define service { hostgroup_name debian-servers service_description SNMP check_command check_system!xxxxxx use generic-service notification_interval 0 ; set > 0 if you want to be renotified } The "xxxxxx" is the private community string previously defined in class. Note that we have included our private community string here vs. hard-coding it in the snmp.cfg file earlier. - Now verify that your changes are correct and restart Nagios. - If you click on the Service Detail menu choice in web interface you should see the the SNMP check appear. PART V Create More Host Groups ----------------------------------------------------------------------------- 1. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg - For the following exercises it will be very useful if we have created or update the following hostgroups: debian-servers routers switches If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you will see an entry for debian-servers that just contains localhost. Update this entry to include all the classroom PCs, including the noc (this assumes that you created a "noc" entry in your pcs.cfg file). # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg Update the entry that says: # A list of your Debian GNU/Linux servers define hostgroup { hostgroup_name debian-servers alias Debian GNU/Linux Servers members localhost } So that the "members" parameter contains something like this. Use your classroom network diagram to confirm the exact number of machines and names in your workshop. The entry below _is not_ correct for your classroom. members noc,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10, pc33,pc34,pc35,pc36,pc37 - Once you have done this, add in two more entries. One for routers and one for switches. Call these entries "routers" and "switches". - When you are done be sure to verify your work and restart Nagios. PART V Extended Host Information ("making your graphs pretty") ----------------------------------------------------------------------------- 1. Update extinfo_nagios2.cfg - If you would like to use appropriate icons for your defined hosts in Nagios this is where you do this. We have the three types of devices: Cisco routers Cisco switches Ubuntu servers There is a fairly large repository of icon images available for you to use located here: /usr/share/nagios/htdocs/images/logos/ these were installed by default as dependent packages of the nagios3 package in Ubuntu. In some cases you can find model-specific icons for your hardware, but to make things simpler we will use the following icons for our hardware: /usr/share/nagios/htodcs/images/logos/base/debian.* /usr/share/nagios/htdocs/images/logos/cook/router.* /usr/share/nagios/htdocs/images/logos/cook/switch.* - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg and tell nagios what image you would like to use to represent your devices. # vi /etc/nagios3/conf.d/extinfo_nagios2.cfg Here is what an entry for your routers looks like (there is already an entry for debian-servers that will work as is). Note that the router model (2811) is not all that important. The image used represents a router in general. define hostextinfo { hostgroup_name routers icon_image cook/router.png icon_image_alt Cisco Routers (2811) vrml_image router.png statusmap_image cook/router.gd2 } Now add an entry for your switches. Once you are done check your work and restart Nagios. Take a look at the Status Map in the web interface. It should be much nicer. PART VI Create Service Groups ----------------------------------------------------------------------------- 1. Create service groups for ssh and http for each set of pcs. - The idea here is to create three service groups. Each service group will be for the group of PCs that are connected to each router xxxxxxx, yyyyyy, zzzzzz, etc. We want to see these PCs grouped together and include status of their ssh and http services. To do this edit and create the file: # vi /etc/nagios3/conf.d/servicegroups.cfg Here is a sample of the service group for the first router xxxxxx: define servicegroup{ servicegroup_name group 1 services alias pcs 1-10 members pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH, pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH,pc5,HTTP pc6,SSH,pc6,HTTP,pc7,SSH,pc7,HTTP,pc8,SSH, pc8,HTTP,pc9,SSH,pc9,HTTP,pc10,SSH,pc10,HTTP } - The above example assumed that pcs 1-10 are connected to one router. Look at your network diagram to see how your classroom is configured. - Save your changes, verify your work and restart Nagios. Now if you click on the Servicegroup menu items in the Nagios web interface you should see this information grouped together. PART VII Configure Guest Access to the Nagios Web Interface ----------------------------------------------------------------------------- 1. Edit /etc/nagios3/cgi.cfg to give read only guest user access to the Nagios web interface. - By default Nagios is configured to give full r/w access via the Nagios web interface to the user nagiosadmin. You can change the name of this user, add other users, change how you authenticate users, what users have access to what resources and more via the cgi.cfg file. - First, lets create a "guest" user and password in the htpasswd.users file. # htpasswd /etc/nagios3/htpasswd.users guest You can use any password you want (or none). A password of "guest" is not a bad choice. - Next, edit the file /etc/nagios3/cgi.cfg and look for what type of access has been given to the nagiosadmin user. By default you will see the following directives (note, there are comments between each directive): authorized_for_system_information=nagiosadmin authorized_for_configuration_information=nagiosadmin authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin Now lets tell Nagios to allow the "guest" user some access to information via the web interface. You can choose whatever you would like, but what is pretty typical is this: authorized_for_system_information=nagiosadmin,guest authorized_for_configuration_information=nagiosadmin,guest authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin,guest authorized_for_all_hosts=nagiosadmin,guest authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin - Once you make the changes, save the file cgi.cfg, verify your work and restart Nagios. - To see if you can log in as the "guest" user you may need to clear the cookies in your web browser. You will not notice any difference in the web interface. The difference is that a number of items that are available via the web interface (forcing a service/host check, scheduling checks, comments, etc.) will not work for the guest user. Last update 30 May 2010 by HA