Nagios 3.0 Jumpstart Guide For Linux – Overview, Installation and Configuration
Following flow summarizes the above explanation:
Nagios Server (check_nt) —–> Remote host (NSClient++) —–> USEDDISKSPACE
Nagios Server (check_nt) <—– Remote host (NSClient++) <—– USEDDISKSPACE (returns disk space usage)
(2) Modify NSC.ini and uncomment allowed_hosts. Edit the C:\Program Files\NSClient++\NSC.ini file and Uncomment allowed_host under settings and add the ip-address of the nagios-server.
(4) Modify NSC.ini and specify password. You can also specify a password the nagios server needs to use to remotely access the NSClient++ agent.
Verify that the windows-server template is enabled under /usr/local/nagios/etc/objects/ templates.cfg2. Uncomment windows.cfg in /usr/local/nagios/etc/nagios.
3. Modify /usr/local/nagios/etc/objects/
Restart nagios as shown below.
Verify the status of the various services running on the remote windows host from the Nagios web UI (http://nagios-server/nagios) as shown below.------------------------------
Following flow summarizes the above explanation:
Nagios Server (check_nrpe) —–> Remote host (NRPE deamon) —–> check_disk
Note: On Red Hat, For me the ./configure command was hanging with the the message: “checking for redhat spopen problem…”. Add --enable-redhat-pthread- workaround to the ./configure command as a work-around for the above problem.
In all the check commands, the “-w” stands for “Warning” and “-c” stands for “Critical”. for e.g. in the check_disk command below, if the available disk space gets to 20% of less, nagios will send warning message. If it gets to 10% or less, nagios will send critical message. Change the value of “-c” and “-w” parameter below depending on your environment.
Note: You can execute any of the commands shown in the nrpe.cfg on the command line on remote host and see the results for yourself. For e.g. When I executed the check_disk command on the command line, it displayed the following:
In the above example, since the free disk space on /dev/hda1 is only 10% , it is displaying the CRITICAL message, which will be returned to nagios server.
Note: 192.168.1.3 in the ip-address of the remotehost where the NRPE and nagios plugin was installed as explained in Section II above.
Service definition sample:
1.7.0) and another for PPTP sessions (iso.3.6.1.4.1.3076.2.1.2.17. 1.9.0). In the above example, VPN LAN-2-LAN active sessions has exceeded the critical limit of 100.
Object Identifier (OID) is arranged in a hierarchical Management Information Base (MIB) tree with roots and branches based on the internet standard.
nagios) as shown below.
commands.cfg
docs/3_0/toc.html
Introduction
This guide is intended to provide you with simple instructions on how to install Nagios from source (code) on Fedora and have it monitoring your local machine inside of 20 minutes. No advanced installation options are discussed here - just the basics that will work for 95% of users who want to get started.
These instructions were written based on a standard Fedora Core 6 Linux distribution.
What You'll End Up With
If you follow these instructions, here's what you'll end up with:
download/ for links to the latest versions). These directions were tested with Nagios 3.1.1 and Nagios Plugins 1.4.11.
contacts.cfg config file with your favorite editor and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.
windows.cfg to find additional object definitions. That's where you'll be adding Windows host and service definitions. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* Windows machine you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.
Installing the Windows Agent
Before you can begin monitoring private services and attributes of Windows machines, you'll need to install an agent on those machines. I recommend using the NSClient++ addon, which can be found at http://sourceforge.net/ projects/nscplus. These instructions will take you through a basic installation of the NSClient++ addon, as well as the configuration of Nagios for monitoring the Windows machine.
1. Download the latest stable version of the NSClient++ addon from http://sourceforge.net/ projects/nscplus
2. Unzip the NSClient++ files into a new C:\NSClient++ directory
3. Open a command prompt and change to the C:\NSClient++ directory
4. Register the NSClient++ system service with the following command:
" argument (where PASSWORD is the password you specified on the Windows machine) like this:
switch.cfg to find additional object definitions. That's where you'll be adding host and service definitions for routers and switches. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* router/switch you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.
Configuring Nagios
You'll need to create some object definitions in order to monitor a new router/switch.
Open the switch.cfg file for editing.
log. Here's the service definition I use to monitor the bandwidth data that's stored in the log file...
1.log" option that gets passed to the check_local_mrtgtrafcommand tells the plugin which MRTG log file to read from. The "AVG" option tells it that it should use average bandwidth statistics. The "1000000,2000000" options are the warning thresholds (in bytes) for incoming traffic rates. The "5000000,5000000" are critical thresholds (in bytes) for outgoing traffic rates. The "10" option causes the plugin to return a CRITICAL state if the MRTG log file is older than 10 minutes (it should be updated every 5 minutes).
Save the file.
Restarting Nagios
Once you've added the new host and service definitions to the switch.cfg file, you're ready to start monitoring the router/switch. To do this, you'll need to verify your configuration and restart Nagios.
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!
printer.cfg to find additional object definitions. That's where you'll be adding host and service definitions for the printer. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* printer you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.
Configuring Nagios
You'll need to create some object definitions in order to monitor a new printer.
Open the printer.cfg file for editing.
download/ for the link to the latest version).
Let us discuss the overview, installation and configuration of Nagios, a powerful open source monitoring solution for host and services.
II. 8 steps for installing nagios on Linux:
Nagios is a host and service monitor tool. Following are some of the features of nagios.
workaround to the ./configure command as a work-around for the above problem as shown below.
Use the userid, password that was created from step#5 above.
The first configuration to modify is to change the default value of email address in /usr/local/nagios/etc/objects/ contacts.cfg file to your email address.
Following are the three major configuration files located under /usr/local/nagios/etc
I. Overview of nagiosII. 8 steps for installing nagios on Linux:
- Download the nagios and plugins
- Take care of the prerequisites
- Create user and group for nagios
- Install nagios
- Configure the web interface
- Compile and install nagios plugins
- Start Nagios
- Login to web interface
I. Overview of Nagios
.Nagios is a host and service monitor tool. Following are some of the features of nagios.
- Monitor equipments such as servers, switches, routers, firewalls, power supply etc.
- Monitor services such as disk space, cpu usage, memory usage, temperature of the equipment, HTTP, Mail, SSH etc.
- Nagios can monitor pretty much anything. for e.g. host, services, databases, applications etc.
- Nagios has an extensible plugin interface for monitoring user defined services. There are lot of plugins available for Nagios. Visit NagiosPlugins and NagiosExchange for review the available user developed plugins.
- It can send out various notifications ( email, pager etc.) when the problem occurs and get resolved.
- Web interface to view current status, notifications, problem history, log files etc.
Fig: Nagios Web UI (click on the image to enlarge)
II. 8 steps for installing nagios on Linux:
1. Download the nagios and plugins
Download following files from Nagios.org and move to /home/downloads- nagios-3.0.1.tar.gz
- nagios-plugins-1.4.11.tar.gz
2. Take care of the prerequisites
- Make sure apache is working on the server by verifying from browser: http://localhost
- Verify whether gcc is installed
[root@localhost]#rpm -qa | grep gcc gcc-3.4.6-8 compat-gcc-32-3.2.3-47.3 libgcc-3.4.6-8 compat-libgcc-296-2.96-132.7.2 compat-gcc-32-c++-3.2.3-47.3 gcc-c++-3.4.6-8
- Verify whether GD is installed
[root@localhost]# rpm -qa gd gd-2.0.28-5.4E
3. Create user and group for nagios
[root@localhost]# useradd nagios [root@localhost]# passwd nagios [root@localhost]# groupadd nagcmd [root@localhost]# usermod -G nagcmd nagios [root@localhost]# usermod -G nagcmd apache
4. Install nagios
[root@localhost]# tar xvf nagios-3.0.1.tar.gz [root@localhost]# cd nagios-3.0.1 [root@localhost]# ./configure --with-command-group=nagcmd [root@localhost]# make all [root@localhost]# make install [root@localhost]# make install-config [root@localhost]# make install-commandmodeFollowing are some additional parameters that you can pass to ./configure to customize your installation. I used only --with-command-group as shown above.
--prefix /opt/nagios Where to put the Nagios files --with-cgiurl /nagios/cgi-bin Web server url where the cgi's will be available --with-htmurl /nagios Web server url where nagios will be available --with-nagios-user nagios user account under which Nagios will run --with-nagios-group nagios group account under which Nagios will run --with-command-group nagcmd group account which will allow the apache user to submit commands to NagiosAt the end of the configure output, it will display a summary as shown below:
*** Configuration summary for nagios 3.0.1 05-28-2008 ***: General Options: ------------------------- Nagios executable: nagios Nagios user/group: nagios,nagios Command user/group: nagios,nagcmd Embedded Perl: no Event Broker: yes Install ${prefix}: /usr/local/nagios Lock file: ${prefix}/var/nagios.lock Check result directory: ${prefix}/var/spool/checkresults Init directory: /etc/rc.d/init.d Apache conf.d directory: /etc/httpd/conf.d Mail program: /bin/mail Host OS: linux-gnu Web Interface Options: ------------------------ HTML URL: http://localhost/nagios/ CGI URL: http://localhost/nagios/cgi- bin/ Traceroute (used by WAP): /bin/traceroute
5. Configure the web interface.
[root@localhost]# make install-webconf [root@localhost# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin New password: Re-type new password: Adding password for user nagiosadmin
6. Compile and install nagios plugins
[root@localhost]# tar xvf nagios-plugins-1.4.11.tar.gz [root@localhost]# cd nagios-plugins-1.4.11 [root@localhost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios [root@localhost]# make [root@localhost]# make installNote: On Red Hat, the ./configure command mentioned above did not work and was hanging at the when it was displaying the message: checking for redhat spopen problem… Add –enable-redhat-pthread-
[root@localhost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
7. Start Nagios
- Add the nagios to the startup routine:
[root@localhost]# chkconfig --add nagios [root@localhost]# chkconfig nagios on
- Verify to make sure there are no errors in the nagios configuration file:
[root@localhost]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
- Start the nagios
[root@localhost]# service nagios start Starting nagios: done.
8. Login to web interface
Nagios Web URL: http://localhost/nagios/Use the userid, password that was created from step#5 above.
III. Configuration files overview
.The first configuration to modify is to change the default value of email address in /usr/local/nagios/etc/objects/
Following are the three major configuration files located under /usr/local/nagios/etc
- nagios.cfg – This is the primary Nagios configuration file where lot of global parameters that controls thenagios can be defined.
- cgi.cfg - This files has configuration information for nagios web interface.
- resource.cfg – If you have to pass some sensitive information (username, password etc.) to a plugin to monitor a specific service, you can define them here. This file is readable only by nagios user and group.
- contacts.cfg: All the contacts who needs to be notified should be defined here. You can specify name, email address, what type of notifications they need to receive and what is the time period this particular contact should be receiving notifications etc.
- commands.cfg – All the commands to check services are defined here. You can use $HOSTNAME$ and $HOSTADDRESS$ macro on the command execution that will substitute the corresponding hostname or host ip-address automatically.
- timeperiods.cfg – Define the timeperiods. for e.g. if you want a service to be monitored only during the business hours, define a time period called businesshours and specify the hours that you would like to monitor.
- templates.cfg – Multiple host or service definition that has similar characteristics can use a template, where all the common characteristics can be defined. Use template is a time saver.
- localhost.cfg – Defines the monitoring for the local host. This is a sample configuration file that comes withnagios installation that you can use as a baseline to define other hosts that you would like to monitor.
- printer.cfg – Sample config file for printer
- switch.cfg – Sample config file for switch
- windows.cfg – Sample config file for a windows machine
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------
windows machine and the various service running on the windows server using nagiosmonitoring server. Following three sections are covered in this article.
I. Overview
II. 4 steps to install nagios on remote windows host
I. Overview
II. 4 steps to install nagios on remote windows host
- Install NSClient++ on the remote windows server
- Modify the NSClient++ Service
- Modify the NSC.ini
- Start the NSClient++ Service
III. 6 configuration steps on nagios monitoring server
- Verify check_nt command and windows-server template
- Uncomment windows.cfg in /usr/local/nagios/etc/nagios.
cfg - Modify /usr/local/nagios/etc/objects/
windows.cfg - Define windows services that should be monitored.
- Enable Password Protection
- Verify Configuration and Restart Nagios.
I. Overview
.
Following three steps will happen on a very high level when Nagios (installed on the nagios-server) monitors a service (for e.g. disk space usage) on the remote Windows host.
Following three steps will happen on a very high level when Nagios (installed on the nagios-server) monitors a service (for e.g. disk space usage) on the remote Windows host.
- Nagios will execute check_nt command on nagios-server and request it to monitor disk usage on remote windows host.
- The check_nt on the nagios-server will contact the NSClient++ service on remote windows host and request it to execute the USEDDISKSPACE on the remote host.
- The results of the USEDDISKSPACE command will be returned back by NSClient++ daemon to the check_nt on nagios-server.
Following flow summarizes the above explanation:
Nagios Server (check_nt) —–> Remote host (NSClient++) —–> USEDDISKSPACE
Nagios Server (check_nt) <—– Remote host (NSClient++) <—– USEDDISKSPACE (returns disk space usage)
II. 4 steps to setup nagios on remote windows host
.
1. Install NSClient++ on the remote windows server
Download NSCP 0.3.1 (NSClient++-Win32-0.3.1.msi) from NSClient++ Project. NSClient++ is an open source windows service that allows performance metrics to be gathered by Nagios for windows services. Go through the following five NSClient++ installation steps to get the installation completed.
(1) NSClient++ Welcome Screen
(2) License Agreement Screen
(3) Select Installation option and location. Use the default option and click next.
(4) Ready to Install Screen. Click on Install to get it started.
(5) Installation completed Screen.
(1) NSClient++ Welcome Screen
(2) License Agreement Screen
(3) Select Installation option and location. Use the default option and click next.
(4) Ready to Install Screen. Click on Install to get it started.
(5) Installation completed Screen.
2. Modify the NSClient++ Service
Go to Control Panel -> Administrative Tools -> Services. Double click on the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ service and select the check-box that says “Allow service to interact with desktop” as shown below.
3. Modify the NSC.ini
(1) Modify NSC.ini and uncomment *.dll: Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment everything under [modules] except RemoteConfiguration.dll and CheckWMI.dll
[modules] ;# NSCLIENT++ MODULES ;# A list with DLLs to load at startup. ; You will need to enable some of these for NSClient++ to work. ; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ; * * ; * N O T I C E ! ! ! - Y O U H A V E T O E D I T T H I S * ; * * ; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! FileLogger.dll CheckSystem.dll CheckDisk.dll NSClientListener.dll NRPEListener.dll SysTray.dll CheckEventLog.dll CheckHelpers.dll ;CheckWMI.dll ; ; RemoteConfiguration IS AN EXTREM EARLY IDEA SO DONT USE FOR PRODUCTION ENVIROMNEMTS! ;RemoteConfiguration.dll ; NSCA Agent is a new beta module use with care! NSCAAgent.dll ; LUA script module used to write your own "check deamon" (sort of) early beta. LUAScript.dll ; Script to check external scripts and/or internal aliases, early beta. CheckExternalScripts.dll ; Check other hosts through NRPE extreme beta and probably a bit dangerous! NRPEClient.dll
(2) Modify NSC.ini and uncomment allowed_hosts. Edit the C:\Program Files\NSClient++\NSC.ini file and Uncomment allowed_host under settings and add the ip-address of the nagios-server.
;# ALLOWED HOST ADDRESSES ; This is a comma-delimited list of IP address of hosts that are allowed to talk to the all daemons. ; If leave this blank anyone can access the deamon remotly (NSClient still requires a valid password). ; The syntax is host or ip/mask so 192.168.0.0/24 will allow anyone on that subnet access allowed_hosts=192.168.1.2/255.255.255.0
Note: allowed_host is located under [Settings], [NSClient] and [NRPE] section. Make sure to change allowed_host under [Settings] for this purpose.
(3) Modify NSC.ini and uncomment port. Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment the port# under [NSClient] section
(3) Modify NSC.ini and uncomment port. Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment the port# under [NSClient] section
;# NSCLIENT PORT NUMBER ; This is the port the NSClientListener.dll will listen to. port=12489
(4) Modify NSC.ini and specify password. You can also specify a password the nagios server needs to use to remotely access the NSClient++ agent.
[Settings] ;# OBFUSCATED PASSWORD ; This is the same as the password option but here you can store the password in an obfuscated manner. ; *NOTICE* obfuscation is *NOT* the same as encryption, someone with access to this file can still figure out the ; password. Its just a bit harder to do it at first glance. ;obfuscated_password=Jw0KAUUdXlAAUwASDAAB ; ;# PASSWORD ; This is the password (-s) that is required to access NSClient remotely. If you leave this blank everyone will be able to access the daemon remotly. password=My2Secure$Password
4. Start the NSClient++ Service
Start the NSClient++ service either from the Control Panel -> Administrative tools -> Services -> Select “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ and click on start (or) Click on “Start -> All Programs -> NSClient++ -> Start NSClient++ (Win32) . Please note that this will start the NSClient++ as a windows service.
Later if you modify anything in the NSC.ini file, you should restart the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ from the windows service.
Later if you modify anything in the NSC.ini file, you should restart the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ from the windows service.
III. 6 configuration steps on nagios monitoring server
.
1. Verify check_nt command and windows-server template
Verify that the check_nt is enabled under /usr/local/nagios/etc/objects/ commands.cfg
# 'check_nt' command definition define command{ command_name check_nt command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$ }
Verify that the windows-server template is enabled under /usr/local/nagios/etc/objects/
# Windows host definition template - This is NOT a real host, just a template! define host{ name windows-server ; The name of this host template use generic-host ; Inherit default values from the generic-host template check_period 24x7 ; By default, Windows servers are monitored round the clock check_interval 5 ; Actively check the server every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each server 10 times (max) check_command check-host-alive ; Default command to check if servers are "alive" notification_period 24x7 ; Send notification out at any time - day or night notification_interval 30 ; Resend notifications every 30 minutes notification_options d,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default hostgroups windows-servers ; Host groups that Windows servers should be a member of register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE }
2. Uncomment windows.cfg in /usr/local/nagios/etc/nagios. cfg
# Definitions for monitoring a Windows machine cfg_file=/usr/local/nagios/etc/objects/windows.cfg
3. Modify /usr/local/nagios/etc/objects/ windows.cfg
By default a sample host definition for a windows server is given under windows.cfg, modify this to reflect the appropriate windows server that needs to be monitored through nagios.
# Define a host for the Windows machine we'll be monitoring # Change the host_name, alias, and address to fit your situation define host{ use windows-server ; Inherit default values from a template host_name remote-windows-host ; The name we're giving to this host alias Remote Windows Host ; A longer name associated with the host address 192.168.1.4 ; IP address of the remote windows host }
4. Define windows services that should be monitored.
Following are the default windows services that are already enabled in the sample windows.cfg. Make sure to update the host_name on these services to reflect the host_name defined in the above step.
define service{ use generic-service host_name remote-windows-host service_description NSClient++ Version check_command check_nt!CLIENTVERSION } define service{ use generic-service host_name remote-windows-host service_description Uptime check_command check_nt!UPTIME } define service{ use generic-service host_name remote-windows-host service_description CPU Load check_command check_nt!CPULOAD!-l 5,80,90 } define service{ use generic-service host_name remote-windows-host service_description Memory Usage check_command check_nt!MEMUSE!-w 80 -c 90 } define service{ use generic-service host_name remote-windows-host service_description C:\ Drive Space check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90 } define service{ use generic-service host_name remote-windows-host service_description W3SVC check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC } define service{ use generic-service host_name remote-windows-host service_description Explorer check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe }
5. Enable Password Protection
If you specified a password in the NSC.ini file of the NSClient++ configuration file on the Windows machine, you’ll need to modify the check_nt command definition to include the password. Modify the /usr/local/nagios/etc/ commands.cfg file and add password as shown below.
define command{ command_name check_nt command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s My2Secure$Password -v $ARG1$ $ARG2$ }
6. Verify Configuration and Restart Nagios.
Verify the nagios configuration files as shown below.
[nagios-server]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Restart nagios as shown below.
[nagios-server]# /etc/rc.d/init.d/nagios stop Stopping nagios: .done. [nagios-server]# /etc/rc.d/init.d/nagios start Starting nagios: done.
Verify the status of the various services running on the remote windows host from the Nagios web UI (http://nagios-server/nagios) as shown below.
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------
How To Monitor Remote Linux Host using Nagios 3.0
I. Overview
II. 6 steps to install Nagios plugin and NRPE on remote host.
- Download Nagios Plugins and NRPE Add-on
- Create nagios account
- Install Nagios Plugins
- Install NRPE
- Setup NRPE to run as daemon
- Modify the /usr/local/nagios/etc/nrpe.cfg
III. 4 Configuration steps on the Nagios monitoring server to monitor remote host:
- Download NRPE Add-on
- Install check_nrpe
- Create host and service definition for remote host
- Restart the nagios service
I. Overview:
.
Following three steps will happen on a very high level when Nagios (installed on the nagios-servers) monitors a service (for e.g. disk space usage) on the remote Linux host.
- Nagios will execute check_nrpe command on nagios-server and request it to monitor disk usage on remote host using check_disk command.
- The check_nrpe on the nagios-server will contact the NRPE daemon on remote host and request it to execute the check_disk on remote host.
- The results of the check_disk command will be returned back by NRPE daemon to the check_nrpe on nagios-server.
Following flow summarizes the above explanation:
Nagios Server (check_nrpe) —–> Remote host (NRPE deamon) —–> check_disk
Nagios Server (check_nrpe) <—– Remote host (NRPE deamon) <—– check_disk (returns disk space usage)
II. 7 steps to install Nagios Plugins and NRPE on the remote host
.
1. Download Nagios Plugins and NRPE Add-on
Download following files from Nagios.org and move to /home/downloads:
- nagios-plugins-1.4.11.tar.gz
- nrpe-2.12.tar.gz
2. Create nagios account
[remotehost]# useradd nagios [remotehost]# passwd nagios
3. Install nagios-plugin
[remotehost]# cd /home/downloads [remotehost]# tar xvfz nagios-plugins-1.4.11.tar.gz [remotehost]# cd nagios-plugins-1.4.11 [remotehost]# export LDFLAGS=-ldl [remotehost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround [remotehost]# make [remotehost]# make install [remotehost]# chown nagios.nagios /usr/local/nagios [remotehost]# chown -R nagios.nagios /usr/local/nagios/libexec/
Note: On Red Hat, For me the ./configure command was hanging with the the message: “checking for redhat spopen problem…”. Add --enable-redhat-pthread-
4. Install NRPE
[remotehost]# cd /home/downloads [remotehost]# tar xvfz nrpe-2.12.tar.gz [remotehost]# cd nrpe-2.12 [remotehost]# ./configure [remotehost]# make all [remotehost]# make install-plugin [remotehost]# make install-daemon [remotehost]# make install-daemon-config [remotehost]# make install-xinetd
5. Setup NRPE to run as daemon (i.e as part of xinetd):
- Modify the /etc/xinetd.d/nrpe to add the ip-address of the Nagios monitoring server to the only_from directive. Note that there is a space after the 127.0.0.1 and the nagios monitoring server ip-address (in this example,nagios monitoring server ip-address is: 192.168.1.2)
only_from = 127.0.0.1 192.168.1.2
- Modify the /etc/services and add the following at the end of the file.
nrpe 5666/tcp # NRPE
- Start the service
[remotehost]#service xinetd restart
- Verify whether NRPE is listening
[remotehost]# netstat -at | grep nrpe tcp 0 0 *:nrpe *:* LISTEN
- Verify to make sure the NRPE is functioning properly
[remotehost]# /usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.12
6. Modify the /usr/local/nagios/etc/nrpe.cfg
The nrpe.cfg file located on the remote host contains the commands that are needed to check the services on the remote host. By default the nrpe.cfg comes with few standard check commands as samples. check_users and check_load are shown below as an example.
command[check_users]=/usr/local/nagios/libexec/check_ users -w 5 -c 10 command[check_load]=/usr/ local/nagios/libexec/check_ load -w 15,10,5 -c 30,25,20
In all the check commands, the “-w” stands for “Warning” and “-c” stands for “Critical”. for e.g. in the check_disk command below, if the available disk space gets to 20% of less, nagios will send warning message. If it gets to 10% or less, nagios will send critical message. Change the value of “-c” and “-w” parameter below depending on your environment.
command[check_disk]=/usr/local/nagios/libexec/check_ disk -w 20% -c 10% -p /dev/hda1
Note: You can execute any of the commands shown in the nrpe.cfg on the command line on remote host and see the results for yourself. For e.g. When I executed the check_disk command on the command line, it displayed the following:
[remotehost]#/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1 DISK CRITICAL - free space: / 6420 MB (10% inode=98%);| /=55032MB;51792;58266;0;64741
In the above example, since the free disk space on /dev/hda1 is only 10% , it is displaying the CRITICAL message, which will be returned to nagios server.
III. 4 Configuration steps on the Nagios monitoring server to monitor remote host:
.
1. Download NRPE Add-on
Download nrpe-2.12.tar.gz from Nagios.org and move to /home/downloads:
2. Install check_nrpe on the nagios monitoring server
[nagios-server]# tar xvfz nrpe-2.12.tar.gz [nagios-server]# cd nrpe-2.1.2 [nagios-server]# ./configure [nagios-server]# make all [nagios-server]# make install-plugin ./configure will give a configuration summary as shown below: *** Configuration summary for nrpe 2.12 05-31-2008 ***: General Options: ————————- NRPE port: 5666 NRPE user: nagios NRPE group: nagios Nagios user: nagios Nagios group: nagios
Note: I got the “checking for SSL headers… configure: error: Cannot find ssl headers” error message while performing ./configure. Install openssl-devel as shown below and run the ./configure again to fix the problem.
[nagios-server]# rpm -ivh openssl-devel-0.9.7a-43.16.i386.rpm krb5-devel-1.3.4-47.i386.rpm zlib-devel-1.2.1.2-1.2.i386. rpm e2fsprogs-devel-1.35-12.5. el4.i386.rpm warning: openssl-devel-0.9.7a-43.16. i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e Preparing… ############################## ############# [100%] 1:e2fsprogs-devel ############################## ############# [ 25%] 2:krb5-devel ############################## ############# [ 50%] 3:zlib-devel ############################## ############# [ 75%] 4:openssl-devel ############################## ############# [100%]
Verify whether nagios monitoring server can talk to the remotehost.
[nagios-server]#/usr/local/nagios/libexec/check_nrpe -H 192.168.1.3 NRPE v2.12
Note: 192.168.1.3 in the ip-address of the remotehost where the NRPE and nagios plugin was installed as explained in Section II above.
3. Create host and service definition for remotehost
Create a new configuration file /usr/local/nagios/etc/objects/ remotehost.cfg to define the host and service definition for this particular remotehost. It is good to take the localhost.cfg and copy it as remotehost.cfg and start modifying it according to your needs.
host definition sample:
define host{ use linux-server host_name remotehost alias Remote Host address 192.168.1.3 contact_groups admins }
Service definition sample:
define service{ use generic-service service_description Root Partition contact_groups admins check_command check_nrpe!check_disk }
Note: In all the above examples, replace remotehost with the corresponding hostname of your remotehost.
4. Restart the nagios service
Restart the nagios as shown below and login to the nagios web (http://nagios-server/nagios/) to verify the status of the remotehost linux sever that was added to nagios for monitoring.
[nagios-server]# service nagios reload
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------
How To Monitor Network Switch and Ports Using Nagios
Nagios is hands-down the best monitoring tool to monitor host and network equipments. Using Nagios plugins you can monitor pretty much monitor anything.
I use Nagios intensively and it gives me peace of mind knowing that I will get an alert on my phone, when there is a problem. More than that, if warning levels are setup properly, Nagios will proactively alert you before a problem becomes critical.
I use Nagios intensively and it gives me peace of mind knowing that I will get an alert on my phone, when there is a problem. More than that, if warning levels are setup properly, Nagios will proactively alert you before a problem becomes critical.
In this article, I’ll explain how to configure Nagios to monitor network switch and it’s active ports.
1. Enable switch.cfg in nagios.cfg
Uncomment the switch.cfg line in /usr/local/nagios/etc/nagios. cfg as shown below.
[nagios-server]# grep switch.cfg /usr/local/nagios/etc/nagios.cfg cfg_file=/usr/local/nagios/ etc/objects/switch.cfg
2. Add new hostgroup for switches in switch.cfg
Add the following switches hostgroup to the /usr/local/nagios/etc/objects/ switch.cfg file.
define hostgroup{ hostgroup_name switches alias Network Switches }
3. Add a new host for the switch to be monitered
In this example, I’ve defined a host to monitor the core switch in the /usr/local/nagios/etc/objects/ switch.cfg file. Change the address directive to your switch ip-address accordingly.
define host{ use generic-switch host_name core-switch alias Cisco Core Switch address 192.168.1.50 hostgroups switches }
4. Add common services for all switches
Displaying the uptime of the switch and verifying whether switch is alive are common services for all switches. So, define these services under the switches hostgroup_name as shown below.
# Service definition to ping the switch using check_ping define service{ use generic-service hostgroup_name switches service_description PING check_command check_ping!200.0,20%!600.0,60% normal_check_interval 5 retry_check_interval 1 } # Service definition to monitor switch uptime using check_snmp define service{ use generic-service hostgroup_name switches service_description Uptime check_command check_snmp!-C public -o sysUpTime.0 }
5. Add service to monitor port bandwidth usage
check_local_mrtgtraf uses the Multil Router Traffic Grapher – MRTG. So, you need to install MRTG for this to work properly. The *.log file mentioned below should point to the MRTG log file on your system.
define service{ use generic-service host_name core-switch service_description Port 1 Bandwidth Usage check_command check_local_mrtgtraf!/var/lib/mrtg/192.168.1.11_1.log!AVG! 1000000,2000000!5000000, 5000000!10 }
6. Add service to monitor an active switch port
Use check_snmp to monitor the specific port as shown below. The following two services monitors port#1 and port#5. To add additional ports, change the value ifOperStatus.n accordingly. i.e n defines the port#.
# Monitor status of port number 1 on the Cisco core switch define service{ use generic-service host_name core-switch service_description Port 1 Link Status check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB } # Monitor status of port number 5 on the Cisco core switch define service{ use generic-service host_name core-switch service_description Port 5 Link Status check_command check_snmp!-C public -o ifOperStatus.5 -r 1 -m RFC1213-MIB }
7. Add services to monitor multiple switch ports together
Sometimes you may need to monitor the status of multiple ports combined together. i.e Nagios should send you an alert, even if one of the port is down. In this case, define the following service to monitor multiple ports.
# Monitor ports 1 - 6 on the Cisco core switch. define service{ use generic-service host_name core-switch service_description Ports 1-6 Link Status check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB, -o ifOperStatus.2 -r 1 -m RFC1213-MIB, -o ifOperStatus.3 -r 1 -m RFC1213-MIB, -o ifOperStatus.4 -r 1 -m RFC1213-MIB, -o ifOperStatus.5 -r 1 -m RFC1213-MIB, -o ifOperStatus.6 -r 1 -m RFC1213-MIB }
8. Validate configuration and restart nagios
Verify the nagios configuration to make sure there are no warnings and errors.
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Restart the nagios server to start monitoring the VPN device.
# /etc/rc.d/init.d/nagios stop Stopping nagios: .done. # /etc/rc.d/init.d/nagios start Starting nagios: done.
Verify the status of the switch from the Nagios web UI: http://{nagios-server}/nagios as shown below:
Fig: Nagios GUI displaying status of a Network Switch
9. Troubleshooting
Issue1: Nagios GUI displays “check_mrtgtraf: Unable to open MRTG log file” error message for the Port bandwidth usage
Solution1: make sure the *.log file defined in the check_local_mrtgtraf service is pointing to the correct location.
Issue2: Nagios UI displays “Return code of 127 is out of bounds – plugin may be missing” error message for Port Link Status.
Issue2: Nagios UI displays “Return code of 127 is out of bounds – plugin may be missing” error message for Port Link Status.
Solution2: Make sure both net-snmp and net-snmp-util packages are installed. In my case, I was missing the net-snmp-utils package and installing it resolved this issue as shown below.
[nagios-server]# rpm -qa | grep net-snmp net-snmp-libs-5.1.2-11.el4_6.11.2 net-snmp-5.1.2-11.el4_6.11.2 [nagios-server]# rpm -ivh net-snmp-utils-5.1.2-11.EL4. 10.i386.rpm Preparing... ############################## ############# [100%] 1:net-snmp-utils ############################## ############# [100%] [nagios-server]# rpm -qa | grep net-snmp net-snmp-libs-5.1.2-11.el4_6. 11.2 net-snmp-5.1.2-11.el4_6.11.2 net-snmp-utils-5.1.2-11.EL4.10
------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ --------------------------
4 Steps to Define Nagios Contacts With Email and Pager Notification
Nagios is one of the best open source server and network monitoring solutions available. Using the flexible nagios framework, you can monitor pretty much anything (including database and custom application). This article, using 4 simple steps, explains how to setup contact definitions who will get notification when a host or service has any issues.
1. Define Generic Contact Template in templates.cfg
Nagios installation gives a default generic contact template that can be used as a reference to build your contacts. Please note that all the directives mentioned in the generic-contact template below are mandatory. So, if you’ve decided not to use the generic-contact template definition in your contacts, you should define all these mandatory definitions inside your contacts yourself.The following generic-contact is already available under /usr/local/nagios/etc/objects/
templates.cfg. Also, the templates.cfg is included in the nagios.cfg by default as shown below. Please note that any of these directives mentioned in the templates.cfg can be overridden when you define a real contact using this generic-template.
# grep templates /usr/local/nagios/etc/nagios.cfg cfg_file=/usr/local/nagios/ etc/objects/templates.cfg Note: generic-contact is available under /usr/local/nagios/etc/objects/ templates.cfg define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 }
- Name - This defines the name of the contact template (generic-contact).
- service_notification_period – This defines when nagios can send notification about services issues (for example, Apache down). By default this is 24×7 timeperiod, which is defined under /usr/local/nagios/etc/objects/
timeperiods.cfg - host_notification_period – This defines when nagios can send notification about host issues (for example, server crashed). By default, this is 24×7 timeperiod.
- service_notification_options – This defines the type of service notification that can be sent out. By default this defines all possible service states including flapping events. This also includes the scheduled service downtime activities.
- host_notification_options – This defines the type of host notifications that can be sent out. By default this defines all possible host states including flapping events. This also includes the scheduled host downtime activities.
- service_notification_commands – By default this defines that the contact should get notification about service issues (for example, database down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-service-by-sms command.
- host_notification_commands – By default this defines that the contact should get notification about host issues (for example, host down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-host-by-sms command.
2. Define Individual Contacts in contacts.cfg
One you’ve confirmed that the generic-contact templates is defined properly, you can start defining individual contacts definition for all the people in your organization who would ever receive any notifications from nagios. Please note that just by defining a contact doesn’t mean that they’ll get notification. Later you have to associate this contact to either a service or host definition as shown in the later sections below. So, feel free to define all possible contacts here. (for example, Developers, DBAs, Sysadmins, IT-Manager, Customer Service Manager, Top Management etc.)
Note: Define these contacts in /usr/local/nagios/etc/objects/contacts.cfg define contact{ contact_name sgupta use generic-contact alias Sanjay Gupta (Developer) email sgupta@sureshkumarpakalapati.in pager 333-333@pager.sureshkumarpakalapati.in } define contact{ contact_name jbourne use generic-contact alias Jason Bourne (Sysadmin) email jbourne@sureshkumarpakalapati.in }
3. Define Contact Groups with Multiple Contacts in contacts.cfg
Once you’ve defined the individual contacts, you can also group them together to send the appropriate notifications. For example, only DBAs needs to be notified about the database down service definition. So, a db-admins group may be required. Also, may be only Unix system administrators needs to be notified when Apache goes down. So, a unix-admins group may be required. Feel free to define as many groups as you think is required. Later you can use these groups in the individual service and host definitions.
Note: Define contact groups in /usr/local/nagios/etc/objects/contacts.cfg define contactgroup{ contactgroup_name db-admins alias Database Administrators members jsmith, jdoe, mraj } define contactgroup{ contactgroup_name unix-admins alias Linux System Administrator members jbourne, dpatel, mshankar }
4. Attach Contact Groups or Individual Contacts to Service and Host Definitions
Once you’ve defined the individual contacts and contact groups, it is time to start attaching them to a specific host or service definition as shown below.
Note: Following host is defined under /usr/local/nagios/etc/objects/servers/email-server.cfg. This can be any host definition file. define host{ use linux-server host_name email-server alias Corporate Email Server address 192.168.1.14 contact_groups unix-admins } Note: Following is defined under /usr/local/nagios/etc/objects/ servers/db-server.cfg. This can be any host definition file. define service{ use generic-service host_name prod-db service_description CPU Load contact_groups unix-admins check_command check_nrpe!check_load } define service{ use generic-service host_name prod-db service_description MySQL Database Status contact_groups db-admins check_command check_mysql_db }
------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ -------------------------
How To Monitor VPN Active Sessions and Temperature Using Nagios
Previously we discussed about how to use Nagios to monitor a Linux and Windows server. In this article, let us review how to monitor active sessions and temperature of VPN device using Nagios. You can monitor pretty much anything about a hardware using the nagios check_snmp plug-in.1. Identify a cfg file to define host, hostgroup and services for VPN device
You can either create a new vpn.cfg file or re-use one of the existing .cfg file. In this article, I’ve added the VPN service and hostgroup definition to an existing switch.cfg file. Make sure the switch.cfg line in nagios.cfg file is not commented as shown below.# grep switch.cfg /usr/local/nagios/etc/nagios.cfg cfg_file=/usr/local/nagios/ etc/objects/switch.cfg 2. Add new hostgroup for VPN device in switch.cfg
Add the following ciscovpn hostgroup to the /usr/local/nagios/etc/objects/switch.cfg file. define hostgroup{ hostgroup_name ciscovpn alias Cisco VPN Concentrator }3. Add new host for VPN device in switch.cfg
In this example, I’ve defined two hosts–one for primary and another for secondary Cisco VPN concentrator in the /usr/local/nagios/etc/objects/switch.cfg file. Change the address directive to your VPN device ip-address accordingly. define host{ use generic-host host_name cisco-vpn-primary alias Cisco VPN Concentrator Primary address 192.168.1.7 check_command check-host-alive max_check_attempts 10 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups admins hostgroups ciscovpn } define host{ use generic-host host_name cisco-vpn-secondary alias Cisco VPN Concentrator Secondary address 192.168.1.9 check_command check-host-alive max_check_attempts 10 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups admins hostgroups ciscovpn }4. Add new services to monitor VPN active sessions and temperature in switch.cfg
Add the “Temperature” service and “Active VPN Sessions” service to the /usr/local/nagios/etc/objects/switch.cfg file. define service{ use generic-service hostgroup_name ciscovpn service_description Temperature is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_interval 960 notification_period 24x7 check_command check_snmp!-l Temperature -o .1.3.6.1.4.1.3076.2.1.2.22.1.29.0,.1.3.6.1.4.1.3076.2.1.2. 22.1.33.0 -w 37,:40 -c :40,:45 } define service{ use generic-service hostgroup_name ciscovpn service_description Active VPN Sessions is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 5 retry_check_interval 1 contact_groups admins notification_interval 960 notification_period 24x7 check_command check_snmp!-l ActiveSessions -o 1.3.6.1.4.1.3076.2.1.2.17.1.7. 0,1.3.6.1.4.1.3076.2.1.2.17.1. 9.0 -w :70,:8 -c :75,:10 } 5. Validate the check_snmp from command line
Check_snmp plug-in uses the ‘snmpget’ command from the NET-SNMP package. Make sure the net-snmp is installed on your system as shown below. If not, download it from NET-SNMP website.# rpm -qa | grep -i net-snmp net-snmp-libs-5.1.2-11.el4_6.Make sure the check_snmp works from command line as shown below.11.2 net-snmp-5.1.2-11.el4_6.11.2 net-snmp-utils-5.1.2-11.EL4.10 # /usr/local/nagios/libexec/In this example, following parameters are passed to the check_snmp:check_snmp -H 192.168.1.7 \ -P 2c -l Temperature -w :35,:40 -c :40,:45 \ -o .1.3.6.1.4.1.3076.2.1.2.22.1. 29.0,.1.3.6.1.4.1.3076.2.1.2. 22.1.33.0 Temperature OK - 35 38 | iso.3.6.1.4.1.3076.2.1.2.22.1. 29.0=35 iso.3.6.1.4.1.3076.2.1.2.22.1. 33.0=38 # /usr/local/nagios/libexec/ check_snmp -H 192.168.1.7 \ -P 2c -l ActiveSessions -w :80,:40 -c :100,:50 \ -o 1.3.6.1.4.1.3076.2.1.2.17.1.7. 0,1.3.6.1.4.1.3076.2.1.2.17.1. 9.0 ActiveSessions CRITICAL - *110* 20 | iso.3.6.1.4.1.3076.2.1.2.17.1. 7.0=110 iso.3.6.1.4.1.3076.2.1.2.17.1. 9.0=20
- -H, –hostname=ADDRESS Host name, IP Address, or unix socket (must be an absolute path)
- -P, –protocol=[1|2c|3] SNMP protocol version
- -l, –label=STRING Prefix label for output from plugin. i.e Temerature or ActiveSessions
- -w, –warning=INTEGER_RANGE(s) Range(s) which will not result in a WARNING status
- -c, –critical=INTEGER_RANGE(s) Range(s) which will not result in a CRITICAL status
- -o, –oid=OID(s) Object identifier(s) or SNMP variables whose value you wish to query. Make sure to refer to the manual of your device to see all the supported and available oid’s for your equipment. If you have more than two oid’s, separate them with comma.
6. Validate configuration and restart nagios
Verify the nagios configuration to make sure there are no warnings and errors.# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.Restart the nagios server to start monitoring the VPN device.cfg Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
# /etc/rc.d/init.d/nagios stop Stopping nagios: .done. # /etc/rc.d/init.d/nagios start Starting nagios: done.Verify the status of the ActiveSessions and Temperature of the VPN device from the Nagios web UI (http://{nagios-server}/
Fig – Nagios Web UI showing VPN Device Status
7. Troubleshooting
Issue: check_snmp works without any issues from Linux command line, but Nagios web UI displays following error:Status Information: SNMP problem - No data received from host CMD: /usr/bin/snmpget -t 1 -r 5 -m '' -v 1 [authpriv] 192.168.1.7:161Solution: Make sure the check_command definition for check_snmp plugin in the switch.cfg file is properly defined. The arguments to the check_snmp command should match the check_snmp definition in the /usr/local/nagios/etc/
check_command check_snmp!Temperature!.1.3.6.In the check_snmp command definition shown below, there is only one $ARG1$ argument. So, in the switch.cfg, while defining the check_snmp, you need to pass only one argument as shown above.1.4.1.3076.2.1.2.22.1.29.0,.1. 3.6.1.4.1.3076.2.1.2.22.1.33. 0!37,:40!:40,:45 [Note: This is wrong, as it is passing 4 arguments to check_snmp command The value after the exclamation is considered as one argument. !{argument1}!{argument2}] check_command check_snmp!-l Temperature -o .1.3.6.1.4.1.3076.2.1.2.22.1. 29.0,.1.3.6.1.4.1.3076.2.1.2. 22.1.33.0 -w 37,:40 -c :40,:45 [Note: This is correct, as it is passing 1 argument to check_snmp command The value after the exclamation is considered as one argument. !{argument1}]
# 'check_snmp' command definition define command{ command_name check_snmp command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$ }
Recommended Reading
These are the two best book that covers the latest Nagios 3. I strongly recommend that you read both of these books to gain a detailed understanding on Nagios. http://nagios.sourceforge.net/- Nagios and the plugins will be installed underneath /usr/local/nagios
- Nagios will be configured to monitor a few aspects of your local system (CPU load, disk usage, etc.)
- The Nagios web interface will be accessible at http://localhost/nagios/
- Apache
- PHP
- GCC compiler
- GD development libraries
yum install httpd php
yum install gcc glibc glibc-common
yum install gd gd-devel
1) Create Account Information
Become the root user.
su -l
Create a new nagios user account and give it a password.
/usr/sbin/useradd -m nagios
passwd nagios
Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both thenagios user and the apache user to the group.
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd apache
2) Download Nagios and the Plugins
Create a directory for storing the downloads.
mkdir ~/downloads
cd ~/downloads
Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/wget http://prdownloads. sourceforge.net/sourceforge/ nagios/nagios-3.2.3.tar.gz
wget http://prdownloads. sourceforge.net/sourceforge/ nagiosplug/nagios-plugins-1.4. 11.tar.gz
3) Compile and Install Nagios
Extract the Nagios source code tarball.
cd ~/downloads
tar xzf nagios-3.2.3.tar.gz
cd nagios-3.2.3
Run the Nagios configure script, passing the name of the group you created earlier like so:
./configure --with-command-group=nagcmd
Compile the Nagios source code.
make all
Install binaries, init script, sample config files and set permissions on the external command directory.
make install
make install-init
make install-config
make install-commandmode
Don't start Nagios yet - there's still more that needs to be done...
4) Customize Configuration
Sample configuration files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. You'll need to make just one change before you proceed...
Edit the /usr/local/nagios/etc/objects/vi /usr/local/nagios/etc/objects/ contacts.cfg
5) Configure the Web Interface
Install the Nagios web config file in the Apache conf.d directory.
make install-webconf
Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.
htpasswd -c /usr/local/nagios/etc/ htpasswd.users nagiosadmin
Restart Apache to make the new settings take effect.
service httpd restart
Note: Consider implementing the ehanced CGI security measures described here to ensure that your web authentication credentials are not compromised.
6) Compile and Install the Nagios Plugins
Extract the Nagios plugins source code tarball.
cd ~/downloads
tar xzf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
Compile and install the plugins.
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
7) Start Nagios
Add Nagios to the list of system services and have it automatically start when the system boots.
chkconfig --add nagios
chkconfig nagios on
Verify the sample Nagios configuration files.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios. cfg
If there are no errors, start Nagios.
service nagios start
8) Modify SELinux Settings
Fedora ships with SELinux (Security Enhanced Linux) installed and in Enforcing mode by default. This can result in "Internal Server Error" messages when you attempt to access the Nagios CGIs.
See if SELinux is in Enforcing mode.
getenforce
Put SELinux into Permissive mode.
setenforce 0
To make this change permanent, you'll have to modify the settings in /etc/selinux/config and reboot.
Instead of disabling SELinux or setting it to permissive mode, you can use the following command to run the CGIs under SELinux enforcing/targeted mode:
chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/
chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
For information on running the Nagios CGIs under Enforcing mode with a targeted policy, visit the Nagios Support Portal or Nagios Community Wiki.
9) Login to the Web Interface
You should now be able to access the Nagios web interface at the URL below. You'll be prompted for the username (nagiosadmin) and password you specified earlier.
http://localhost/nagios/
Click on the "Service Detail" navbar link to see details of what's being monitored on your local machine. It will take a few minutes for Nagios to check all the services associated with your machine, as the checks are spread out over time.
10) Other Modifications
Make sure your machine's firewall rules are configured to allow access to the web server if you want to access theNagios interface remotely.
Configuring email notifications is out of the scope of this documentation. While Nagios is currently configured to send you email notifications, your system may not yet have a mail program properly installed or configured. Refer to your system documentation, search the web, or look to the Nagios Support Portal or Nagios Community Wiki for specific instructions on configuring your system to send email messages to external addresses. More information on notifications can be found here.
11) You're Done
Congratulations! You sucessfully installed Nagios. Your journey into monitoring is just beginning. You'll no doubt want to monitor more than just your local machine, so check out the following docs...
Security Considerations
Introduction
This is intended to be a brief overview of some things you should keep in mind when installing Nagios, so as set it up in a secure manner.
Your monitoring box should be viewed as a backdoor into your other systems. In many cases, the Nagiosserver might be allowed access through firewalls in order to monitor remote servers. In most all cases, it is allowed to query those remote servers for various information. Monitoring servers are always given a certain level of trust in order to query remote systems. This presents a potential attacker with an attractive backdoor to your systems. An attacker might have an easier time getting into your other systems if they compromise the monitoring server first. This is particularly true if you are making use of shared SSH keys in order to monitor remote systems.
If an intruder has the ability to submit check results or external commands to the Nagios daemon, they have the potential to submit bogus monitoring data, drive you nuts you with bogus notifications, or cause event handler scripts to be triggered. If you have event handler scripts that restart services, cycle power, etc. this could be particularly problematic.
Another area of concern is the ability for intruders to sniff monitoring data (status information) as it comes across the wire. If communication channels are not encrypted, attackers can gain valuable information by watching your monitoring information. Take as an example the following situation: An attacker captures monitoring data on the wire over a period of time and analyzes the typical CPU and disk load usage of your systems, along with the number of users that are typically logged into them. The attacker is then able to determine the best time to compromise a system and use its resources (CPU, etc.) without being noticed.
Here are some tips to help ensure that you keep your systems secure when implementing a Nagios-based monitoring solution...
Best Practices
- Use a Dedicated Monitoring Box. I would recommend that you install Nagios on a server that is dedicated to monitoring (and possibly other admin tasks). Protect your monitoring server as if it were one of the most important servers on your network. Keep running services to a minimum and lock down access to it via TCP wrappers, firewalls, etc. Since the Nagios server is allowed to talk to your servers and may be able to poke through your firewalls, allowing users access to your monitoring server can be a security risk. Remember, its always easier to gain root access through a system security hole if you have a local account on a box.
- Don't Run Nagios As Root. Nagios doesn't need to run as root, so don't do it. You can tell Nagios to drop privileges after startup and run as another user/group by using the nagios_user and nagios_group directives in the main config file. If you need to execute event handlers or plugins which require root access, you might want to try using sudo.
- Lock Down The Check Result Directory. Make sure that only the nagios user is able to read/write in thecheck result path. If users other than nagios (or root) are able to write to this directory, they could send fake host/service check results to the Nagios daemon. This could result in annoyances (bogus notifications) or security problems (event handlers being kicked off).
- Lock Down The External Command File. If you enable external commands, make sure you set proper permissions on the /usr/local/nagios/var/rw directory. You only want the Nagios user (usually nagios) and the web server user (usually nobody, httpd, apache2, or www-data) to have permissions to write to the command file. If you've installed Nagios on a machine that is dedicated to monitoring and admin tasks and is not used for public accounts, that should be fine. If you've installed it on a public or multi-user machine (not recommended), allowing the web server user to have write access to the command file can be a security problem. After all, you don't want just any user on your system controlling Nagios through the external command file. In this case, I would suggest only granting write access on the command file to the nagiosuser and using something like CGIWrap to run the CGIs as the nagios user instead of nobody.
- Require Authentication In The CGIs. I would strongly suggest requiring authentication for accessing the CGIs. Once you do that, read the documentation on the default rights that authenticated contacts have, and only authorize specific contacts for additional rights as necessary. Instructions on setting up authentication and configuring authorization rights can be found here. If you disable the CGI authentication features using the use_authentication directive in the CGI config file, the command CGI will refuse to write any commands to the external command file. After all, you don't want the world to be able to control Nagios do you?
- Implement Enhanced CGI Security Measures. I would strongly suggest that you consider implementing enhanced security measures for the CGIs as described here. These measures can help ensure that the username/password you use to access the Nagios web interface are not intercepted by third parties.
- Use Full Paths In Command Definitions. When you define commands, make sure you specify the full path(not a relative one) to any scripts or binaries you're executing.
- Hide Sensitive Information With $USERn$ Macros. The CGIs read the main config file and object config file(s), so you don't want to keep any sensitive information (usernames, passwords, etc) in there. If you need to specify a username and/or password in a command definition use a $USERn$ macro to hide it. $USERn$ macros are defined in one or more resource files. The CGIs will not attempt to read the contents of resource files, so you can set more restrictive permissions (600 or 660) on them. See the sample resource.cfg file in the base of the Nagios distribution for an example of how to define $USERn$ macros.
- Strip Dangerous Characters From Macros. Use the illegal_macro_output_chars directive to strip dangerous characters from the $HOSTOUTPUT$, $SERVICEOUTPUT$, $HOSTPERFDATA$, and $SERVICEPERFDATA$ macros before they're used in notifications, etc. Dangerous characters can be anything that might be interpreted by the shell, thereby opening a security hole. An example of this is the presence of backtick (`) characters in the $HOSTOUTPUT$, $SERVICEOUTPUT$, $HOSTPERFDATA$, and/or $SERVICEPERFDATA$ macros, which could allow an attacker to execute an arbitrary command as the nagios user (one good reason not to run Nagios as the root user).
- Secure Access to Remote Agents. Make sure you lock down access to agents (NRPE, NSClient, SNMP, etc.) on remote systems using firewalls, access lists, etc. You don't want everyone to be able to query your systems for status information. This information could be used by an attacker to execute remote event handler scripts or to determine the best times to go unnoticed.
- Secure Communication Channels. Make sure you encrypt communication channels between differentNagios installations and between your Nagios servers and your monitoring agents whenever possible. You don't want someone to be able to sniff status information going across your network. This information could be used by an attacker to determine the best times to go unnoticed.
Monitoring Windows Machines
Introduction This document describes how you can monitor "private" services and attributes of Windows machines, such as:- Memory usage
- CPU load
- Disk usage
- Service states
- Running processes
- etc.
- Perform first-time prerequisites
- Install a monitoring agent on the Windows machine
- Create new host and service definitions for monitoring the Windows machine
- Restart the Nagios daemon
- A check_nt command definition has been added to the commands.cfg file. This allows you to use thecheck_nt plugin to monitor Window services.
- A Windows server host template (called windows-server) has already been created in the templates.cfg file. This allows you to add new Windows host definitions in a simple manner.
vi /usr/local/nagios/etc/nagios. cfg
Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/ etc/objects/windows.cfg
Save the file and exit.
What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/ nsclient++ /install
5. Install the NSClient++ systray with the following command ('SysTray' is case-sensitive):
nsclient++ SysTray
6. Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the 'Log On' tab of the services manager). If it isn't already allowed to interact with the desktop, check the box to allow it to.
7. Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:
- Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
- Optionally require a password for clients by changing the 'password' option in the [Settings] section.
- Uncomment the 'allowed_hosts' option in the [Settings] section. Add the IP address of the Nagios server to this line, or leave it blank to allow all hosts to connect.
- Make sure the 'port' option in the [NSClient] section is uncommented and set to '12489' (the default port).
nsclient++ /start
9. If installed properly, a new icon should appear in your system tray. It will be a yellow circle with a black 'M' inside.
10. Success! The Windows server can now be added to the Nagios monitoring configuration...
Configuring Nagios
Now it's time to define some object definitions in your Nagios configuration files in order to monitor the new Windows machine.
Open the windows.cfg file for editing.
vi /usr/local/nagios/etc/objects/ windows.cfg
Add a new host definition for the Windows machine that you're going to monitor. If this is the *first* Windows machine you're monitoring, you can simply modify the sample host definition in windows.cfg. Change the host_name, alias, andaddress fields to appropriate values for the Windows box.
define host{
use windows-server ; Inherit default values from a Windows server template (make sure you keep this line!)
host_name winserver
alias My Windows Server
address 192.168.1.2
}
Good. Now you can add some service definitions (to the same configuration file) in order to tell Nagios to monitor different aspects of the Windows machine. If this is the *first* Windows machine you're monitoring, you can simply modify the sample service definitions in windows.cfg.
Note: Replace "winserver" in the example definitions below with the name you specified in the host_namedirective of the host definition you just added.
Add the following service definition to monitor the version of the NSClient++ addon that is running on the Windows server. This is useful when it comes time to upgrade your Windows servers to a newer version of the addon, as you'll be able to tell which Windows machines still need to be upgraded to the latest version of NSClient++.
define service{
use generic-service
host_name winserver
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
Add the following service definition to monitor the uptime of the Windows server.
define service{
use generic-service
host_name winserver
service_description Uptime
check_command check_nt!UPTIME
}
Add the following service definition to monitor the CPU utilization on the Windows server and generate a CRITICAL alert if the 5-minute CPU load is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.
define service{
use generic-service
host_name winserver
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}
Add the following service definition to monitor memory usage on the Windows server and generate a CRITICAL alert if memory usage is 90% or more or a WARNING alert if memory usage is 80% or greater.
define service{
use generic-service
host_name winserver
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
}
Add the following service definition to monitor usage of the C:\ drive on the Windows server and generate a CRITICAL alert if disk usage is 90% or more or a WARNING alert if disk usage is 80% or greater.
define service{
use generic-service
host_name winserver
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
Add the following service definition to monitor the W3SVC service state on the Windows machine and generate a CRITICAL alert if the service is stopped.
define service{
use generic-service
host_name winserver
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
Add the following service definition to monitor the Explorer.exe process on the Windows machine and generate a CRITICAL alert if the process is not running.
define service{
use generic-service
host_name winserver
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}
That's it for now. You've added some basic services that should be monitored on the Windows box. Save the configuration file.
Password Protection
If you specified a password in the NSClient++ configuration file on the Windows machine, you'll need to modify thecheck_nt command definition to include the password. Open the commands.cfg file for editing.
vi /usr/local/nagios/etc/objects/ commands.cfg
Change the definition of the check_nt command to include the "-s define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$
}
Save the file.
Restarting Nagios
You're done with modifying the Nagios configuration, so you'll need to verify your configuration files and restart Nagios.
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!
Monitoring Publicly Available Services
Introduction This document describes how you can monitor publicly available services, applications and protocols. By "public" I mean services that are accessible across the network - either the local network or the greater Internet. Examples of public services include HTTP, POP3, IMAP, FTP, and SSH. There are many more public services that you probably use on a daily basis. These services and applications, as well as their underlying protocols, can usually be monitored by Nagios without any special access requirements. Private services, in contrast, cannot be monitored with Nagios without an intermediary agent of some kind. Examples of private services associated with hosts are things like CPU load, memory usage, disk usage, current user count, process information, etc. These private services or attributes of hosts are not usually exposed to external clients. This situation requires that an intermediary monitoring agent be installed on any host that you need to monitor such information on. More information on monitoring private services on different types of hosts can be found in the documentation on: Tip: Occassionally you will find that information on private services and applications can be monitored with SNMP. The SNMP agent allows you to remotely monitor otherwise private (and inaccessible) information about the host. For more information about monitoring services using SNMP, check out the documentation on monitoring switches and routers. Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample commands.cfg and localhost.cfg config files. Plugins For Monitoring Services When you find yourself needing to monitor a particular application, service, or protocol, chances are good that a pluginexists to monitor it. The official Nagios plugins distribution comes with plugins that can be used to monitor a variety of services and protocols. There are also a large number of contributed plugins that can be found in the contrib/subdirectory of the plugin distribution. The NagiosExchange.org website hosts a number of additional plugins that have been written by users, so check it out when you have a chance. If you don't happen to find an appropriate plugin for monitoring what you need, you can always write your own. Plugins are easy to write, so don't let this thought scare you off. Read the documentation on developing plugins for more information. I'll walk you through monitoring some basic services that you'll probably use sooner or later. Each of these services can be monitored using one of the plugins that gets installed as part of the Nagios plugins distribution. Let's get started... Creating A Host Definition Before you can monitor a service, you first need to define a host that is associated with the service. You can place host definitions in any object configuration file specified by a cfg_file directive or placed in a directory specified by acfg_dir directive. If you have already created a host definition, you can skip this step. For this example, lets say you want to monitor a variety of services on a remote host. Let's call that host remotehost. The host definition can be placed in its own file or added to an already exiting object configuration file. Here's what the host definition for remotehost might look like:define host{
use generic-host ; Inherit default values from a template
host_name remotehost ; The name we're giving to this host
alias Some Remote Host ; A longer name associated with the host
address 192.168.1.50 ; IP address of the host
hostgroups allhosts ; Host groups this host is associated with
}
Now that a definition has been added for the host that will be monitored, we can start defining services that should be monitored. As with host definitions, service definitions can be placed in any object configuration file.
Creating Service Definitions
For each service you want to monitor, you need to define a service in Nagios that is associated with the host definition you just created. You can place service definitions in any object configuration file specified by a cfg_file directive or placed in a directory specified by a cfg_dir directive.
Some example service definitions for monitoring common public service (HTTP, FTP, etc.) are given below.
Monitoring HTTP
Chances are you're going to want to monitor web servers at some point - either yours or someone else's. Thecheck_http plugin is designed to do just that. It understands the HTTP protocol and can monitor response time, error codes, strings in the returned HTML, server certificates, and much more.
The commands.cfg file contains a command definition for using the check_http plugin. It looks like this:
define command{
name check_http
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the HTTP service on the remotehost machine might look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description HTTP
check_command check_http
}
This simple service definition will monitor the HTTP service running on remotehost. It will produce alerts if the web server doesn't respond within 10 seconds or if it returns HTTP errors codes (403, 404, etc.). That's all you need for basic monitoring. Pretty simple, huh?
Tip: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin. This --help syntax works with all of the plugins I'll cover in this document.
A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /download/index.php URI contains the string "latest-version.tar.gz". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description Product Download Link
check_command check_http!-u /download/index.php -t 5 -s "latest-version.tar.gz"
}
Monitoring FTP
When you need to monitor FTP servers, you can use the check_ftp plugin. The commands.cfg file contains a command definition for using the check_ftp plugin, which looks like this:
define command{
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the FTP server on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description FTP
check_command check_ftp
}
This service definition will monitor the FTP service and generate alerts if the FTP server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the FTP server running on port 1023 onremotehost. It will generate an alert if the server doesn't respond within 5 seconds or if the server response doesn't contain the string "Pure-FTPd [TLS]".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description Special FTP
check_command check_ftp!-p 1023 -t 5 -e "Pure-FTPd [TLS]"
}
Monitoring SSH
When you need to monitor SSH servers, you can use the check_ssh plugin. The commands.cfg file contains a command definition for using the check_ssh plugin, which looks like this:
define command{
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
A simple service definition for monitoring the SSH server on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SSH
check_command check_ssh
}
This service definition will monitor the SSH service and generate alerts if the SSH server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the SSH server and generate an alert if the server doesn't respond within 5 seconds or if the server version string string doesn't match "OpenSSH_4.2".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SSH Version Check
check_command check_ssh!-t 5 -r "OpenSSH_4.2"
}
Monitoring SMTP
The check_smtp plugin can be using for monitoring your email servers. The commands.cfg file contains a command definition for using the check_smtp plugin, which looks like this:
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the SMTP server on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SMTP
check_command check_smtp
}
This service definition will monitor the SMTP service and generate alerts if the SMTP server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the SMTP server and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SMTP Response Check
check_command check_smtp!-t 5 -e "mygreatmailserver.com"
}
Monitoring POP3
The check_pop plugin can be using for monitoring the POP3 service on your email servers. The commands.cfg file contains a command definition for using the check_pop plugin, which looks like this:
define command{
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the POP3 service on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description POP3
check_command check_pop
}
This service definition will monitor the POP3 service and generate alerts if the POP3 server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the POP3 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description POP3 Response Check
check_command check_pop!-t 5 -e "mygreatmailserver.com"
}
Monitoring IMAP
The check_imap plugin can be using for monitoring IMAP4 service on your email servers. The commands.cfg file contains a command definition for using the check_imap plugin, which looks like this:
define command{
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the IMAP4 service on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description IMAP
check_command check_imap
}
This service definition will monitor the IMAP4 service and generate alerts if the IMAP server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the IMAP4 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description IMAP4 Response Check
check_command check_imap!-t 5 -e "mygreatmailserver.com"
}
Restarting Nagios
Once you've added the new host and service definitions to your object configuration file(s), you're ready to start monitoring them. To do this, you'll need to verify your configuration and restart Nagios.
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!
Monitoring Linux/Unix Machines
Introduction This document describes how you can monitor "private" services and attributes of Linux/UNIX servers, such as:- CPU load
- Memory usage
- Disk usage
- Logged in users
- Running processes
- etc.
Monitoring Routers and Switches
Introduction This document describes how you can monitor the status of network switches and routers. Some cheaper "unmanaged" switches and hubs don't have IP addresses and are essentially invisible on your network, so there's not any way to monitor them. More expensive switches and routers have addresses assigned to them and can be monitored by pinging them or using SNMP to query status information. I'll describe how you can monitor the following things on managed switches, hubs, and routers:- Packet loss, round trip average
- SNMP status information
- Bandwidth / traffic rate
- Perform first-time prerequisites
- Create new host and service definitions for monitoring the device
- Restart the Nagios daemon
- Two command definitions (check_snmp and check_local_mrtgtraf) have been added to the commands.cfgfile. These allows you to use the check_snmp and check_mrtgtraf plugins to monitor network routers.
- A switch host template (called generic-switch) has already been created in the templates.cfg file. This allows you to add new router/switch host definitions in a simple manner.
vi /usr/local/nagios/etc/nagios. cfg
Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/ etc/objects/switch.cfg
Save the file and exit.
What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/vi /usr/local/nagios/etc/objects/ switch.cfg
Add a new host definition for the switch that you're going to monitor. If this is the *first* switch you're monitoring, you can simply modify the sample host definition in switch.cfg. Change the host_name, alias, and address fields to appropriate values for the switch.
define host{
use generic-switch ; Inherit default values from a template
host_name linksys-srw224p ; The name we're giving to this switch
alias Linksys SRW224P Switch ; A longer name associated with the switch
address 192.168.1.253 ; IP address of the switch
hostgroups allhosts,switches ; Host groups this switch is associated with
}
Monitoring Services
Now you can add some service definitions (to the same configuration file) to monitor different aspects of the switch. If this is the *first* switch you're monitoring, you can simply modify the sample service definition in switch.cfg.
Note: Replace "linksys-srw224p" in the example definitions below with the name you specified in the host_namedirective of the host definition you just added.
Monitoring Packet Loss and RTA
Add the following service definition in order to monitor packet loss and round trip average between the Nagios host and the switch every 5 minutes under normal conditions.
define service{
use generic-service ; Inherit values from a template
host_name linksys-srw224p ; The name of the host the service is associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
This service will be:
- CRITICAL if the round trip average (RTA) is greater than 600 milliseconds or the packet loss is 60% or more
- WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more
- OK if the RTA is less than 200 ms and the packet loss is less than 20%
define service{
use generic-service ; Inherit values from a template
host_name linksys-srw224p
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}
In the check_command directive of the service definition above, the "-C public" tells the plugin that the SNMP community name to be used is "public" and the "-o sysUpTime.0" indicates which OID should be checked.
If you want to ensure that a specific port/interface on the switch is in an up state, you could add a service definition like this:
define service{
use generic-service ; Inherit values from a template
host_name linksys-srw224p
service_description Port 1 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}
In the example above, the "-o ifOperStatus.1" refers to the OID for the operational status of port 1 on the switch. The "-r 1" option tells the check_snmp plugin to return an OK state if "1" is found in the SNMP result (1 indicates an "up" state on the port) and CRITICAL if it isn't found. The "-m RFC1213-MIB" is optional and tells the check_snmp plugin to only load the "RFC1213-MIB" instead of every single MIB that's installed on your system, which can help speed things up.
That's it for the SNMP monitoring example. There are a million things that can be monitored via SNMP, so its up to you to decide what you need and want to monitor. Good luck!
Tip: You can usually find the OIDs that can be monitored on a switch by running the following command (replace192.168.1.253 with the IP address of the switch): snmpwalk -v1 -c public 192.168.1.253 -m ALL .1
Monitoring Bandwidth / Traffic Rate
If you're monitoring bandwidth usage on your switches or routers using MRTG, you can have Nagios alert you when traffic rates exceed thresholds you specify. The check_mrtgtraf plugin (which is included in the Nagios plugins distribution) allows you to do this.
You'll need to let the check_mrtgtraf plugin know what log file the MRTG data is being stored in, along with thresholds, etc. In my example, I'm monitoring one of the ports on a Linksys switch. The MRTG log file is stored in/var/lib/mrtg/192.168.1.253_1.define service{
use generic-service ; Inherit values from a template
host_name linksys-srw224p
service_description Port 1 Bandwidth Usage
check_command check_local_mrtgtraf!/var/lib/ mrtg/192.168.1.253_1.log!AVG! 1000000,2000000!5000000, 5000000!10
}
In the example above, the "/var/lib/mrtg/192.168.1.253_Monitoring Network Printers
Introduction This document describes how you can monitor the status of networked printers. Specifically, HP printers that have internal/external JetDirect cards/devices, or other print servers (like the Troy PocketPro 100S or the Netgear PS101) that support the JetDirect protocol. The check_hpjd plugin (which is part of the standard Nagios plugins distribution) allows you to monitor the status of JetDirect-capable printers which have SNMP enabled. The plugin is capable of detecting the following printer states:- Paper Jam
- Out of Paper
- Printer Offline
- Intervention Required
- Toner Low
- Insufficient Memory
- Open Door
- Output Tray is Full
- and more...
- Perform first-time prerequisites
- Create new host and service definitions for monitoring the printer
- Restart the Nagios daemon
- A check_hpjd command definition has been added to the commands.cfg file. This allows you to use thecheck_hpjd plugin to monitor network printers.
- A printer host template (called generic-printer) has already been created in the templates.cfg file. This allows you to add new printer host definitions in a simple manner.
vi /usr/local/nagios/etc/nagios. cfg
Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/ etc/objects/printer.cfg
Save the file and exit.
What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/vi /usr/local/nagios/etc/objects/ printer.cfg
Add a new host definition for the networked printer that you're going to monitor. If this is the *first* printer you're monitoring, you can simply modify the sample host definition in printer.cfg. Change the host_name, alias, and addressfields to appropriate values for the printer.
define host{
use generic-printer ; Inherit default values from a template
host_name hplj2605dn ; The name we're giving to this printer
alias HP LaserJet 2605dn ; A longer name associated with the printer
address 192.168.1.30 ; IP address of the printer
hostgroups allhosts ; Host groups this printer is associated with
}
Now you can add some service definitions (to the same configuration file) to monitor different aspects of the printer. If this is the *first* printer you're monitoring, you can simply modify the sample service definition in printer.cfg.
Note: Replace "hplj2605dn" in the example definitions below with the name you specified in the host_namedirective of the host definition you just added.
Add the following service definition to check the status of the printer. The service uses the check_hpjd plugin to check the status of the printer every 10 minutes by default. The SNMP community string used to query the printer is "public" in this example.
define service{
use generic-service ; Inherit values from a template
host_name hplj2605dn ; The name of the host the service is associated with
service_description Printer Status ; The service description
check_command check_hpjd!-C public ; The command used to monitor the service
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
Add the following service definition to ping the printer every 10 minutes by default. This is useful for monitoring RTA, packet loss, and general network connectivity.
define service{
use generic-service
host_name hplj2605dn
service_description PING
check_command check_ping!3000.0,80%!5000.0, 100%
normal_check_interval 10
retry_check_interval 1
}
Save the file.
Restarting Nagios
Once you've added the new host and service definitions to the printer.cfg file, you're ready to start monitoring the printer. To do this, you'll need to verify your configuration and restart Nagios.
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!
Upgrading Nagios
Upgrading From Previous Nagios 3.x Releases As newer alpha, beta, and stable releases of Nagios 3.x are released, you should strongly consider upgrading as soon as possible. Newer releases usually contain critical bug fixes, so its important to stay up to date. Assuming you've already installed Nagios from source code as described in the quickstart guide, you can install newer versions ofNagios 3.x easily. You don't even need root access to do it, as everything that needed to be done as root was done during the initial install. Here's the upgrade process... Make sure you have a good backup of your existing Nagios installation and configuration files. If anything goes wrong or doesn't work, this will allow you to rollback to your old version. Become the nagios user. Debian/Ubuntu users should use sudo -s nagios.su -l nagios
Removed the following old HTML files that were used by the web frontend. They have been replaced by PHP equivalents.
rm /usr/local/nagios/share/{main, side,index}.html
Download the source code tarball of the latest version of Nagios (visit http://www.nagios.org/wget http://osdn.dl.sourceforge. net/sourceforge/nagios/nagios- 3.x.tar.gz
Extract the Nagios source code tarball.
tar xzf nagios-3.x.tar.gz
cd nagios-3.x
Run the Nagios configure script, passing the name of the group used to control external command file permissions like so:
./configure --with-command-group=nagcmd
Compile the Nagios source code.
make all
Install updated binaries, documentation, and web web interface. Your existing configuration files will not be overwritten by this step.
make install
Verify your configuration files. Correct any errors shown here before proceeding with the next step.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios. cfg
Restart Nagios. Debian/Ubuntu users should use /etc/init.d/nagios restart.
/sbin/service nagios restart
That's it - you're done!
Upgrading From Nagios 2.x
It shouldn't be too difficult to upgrade from Nagios 2.x to Nagios 3. The upgrade is essentially the same as what is described above for upgrading to newer 3.x releases. You will, however, have to change your configuration files a bit so they work with Nagios 3:
- The old service_reaper_frequency variable in the main config file has been renamed tocheck_result_reaper_frequency.
- The old $NOTIFICATIONNUMBER$ macro has been deprecated in favor of new$HOSTNOTIFICATIONNUMBER$ and $SERVICENOTIFICATIONNUMBER$ macros.
- The old parallelize directive in service definitions is now deprecated and no longer used, as all service checks are run in parallel.
- The old aggregate_status_updates option has been removed. All status file updates are now aggregated at a minimum interval of 1 second.
- Extended host and extended service definitions have been deprecated. They are still read and processed byNagios, but it is recommended that you move the directives found in these definitions to your host and service definitions, respectively.
- The old downtime_file file variable in the main config file is no longer supported, as scheduled downtime entries are now saved in the retention file. To preserve existing downtime entries, stop Nagios 2.x and append the contents of your old downtime file to the retention file.
- The old comment_file file variable in the main config file is no longer supported, as comments are now saved in the retention file. To preserve existing comments, stop Nagios 2.x and append the contents of your old comment file to the retention file.
- Stop Nagios
- Backup your existing Nagios installation
- Configuration files
- Main config file (usually nagios.cfg)
- Resource config file (usually resource.cfg)
- CGI config file (usually cgi.cfg)
- All your object definition files
- Retention file (usually retention.dat)
- Current Nagios log file (usually nagios.log)
- Archived Nagios log files
- Configuration files
- Uninstall the original RPM or APT package
- Install Nagios from source by following the quickstart guide
- Restore your original Nagios configuration files, retention file, and log files
- Verify your configuration and start Nagios