现在的位置: 首页 > 综合 > 正文

ubuntu nagios 配置监控远程主机

2018年02月09日 ⁄ 综合 ⁄ 共 7108字 ⁄ 字号 评论关闭

本系列文章旨在记录作者搭建nagios监控的安装及配置步骤,都经过测试,欢迎指正。
nagios简介:
 
  Nagios是一款开源的免费网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。
 
  本篇文章将详细说明如何在ubuntu12.04 server 上安装nagios,并监控本机基本信息。
 
  nagios监控主服务器的配置在上一节介绍:《ubuntu 安装配置 nagios》
一、准备
1.更新ubuntu系统    
sudo apt-get update    
sudo apt-get upgrade


2.依赖基本包:    
sudo apt-get install build-essential
sudo apt-get install libssl0.9.8 libssl-dev openssl (openssl貌似已经安装了)    
sudo apt-get install libgd2-noxpm libgd2-noxpm-dev        
sudo apt-get install apache2 (防止check_http时出现Connection refused错误)    
(安装完nagios plugin后可以检查一下http,检查:/usr/local/nagios/libexec/check_http -H 127.0.0.1    
错误结果:Connection refusedHTTP CRITICAL - Unable to open TCP socket    
启动apache: service apache2 start 后    
再检查,正确结果: HTTP OK: HTTP/1.1 200 OK - 452 bytes in 0.001 second response time |time=0.001221s;;;0.000000 size=452B;;;0 )
3.下载 下载所需安装包,在/usr/local/src目录下载    
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.15.tar.gz    
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz


4.添加nagios用户和组
groupadd nagios    
useradd -g nagios -s /sbin/nologin nagios
5.在被监控机器上安装nagios plugin    
tar zxvf nagios-plugins-1.4.15.tar.gz    
cd nagios-plugins-1.4.15     
./configure --with-nagios-user=nagios --with-nagios-group=nagios     
make     
make install    
 
修改nagios目录用户和组    
chown -R nagios:nagios /usr/local/nagios/


6.在被监控机器上安装nrpe    
tar zxvf nrpe-2.12.tar.gz    
cd nrpe-2.12     
./configure


出错:    checking for SSL libraries... configure: error: Cannot find ssl libraries    
解决,创建一个user/lib/libssl.so=>/usr/lib/x86_64-linux-gnu/libssl.so的简单符号连接:
ln -s /usr/lib/x86_64-linux-gnu/libssl.so /usr/lib/libssl.so  
这里/usr/lib/x86_64-linux-gnu/libssl.so目录可能不是这一个,可以通过命令whereis ssl来查看,32位ubuntu上可能是/usr/lib/i386-linux-gnu/libssl.so
重新
./configure
编译安装:    
make all     
make install-plugin     
make install-daemon     
make install-daemon-config


修改nagios目录用户和组    
chown -R nagios:nagios /usr/local/nagios/


7.修改NRPE配置文件,让监控主机可以访问被监控主机的NRPE,缺省NRPE配置文件中只允许本机访问NRPE的Daemon    
vi /usr/local/nagios/etc/nrpe.cfg    
#缺省为127.0.0.1,只能本机访问    
allowed_hosts=127.0.0.1,192.168.0.102 (多个ip,用逗号隔开)
配置command(可能有些已经配置好了)
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p / 
command[check_http]=/usr/local/nagios/libexec/check_http -H 127.0.0.1 -w 5 -c 10  
command[check_ping]=/usr/local/nagios/libexec/check_ping -H 127.0.0.1 -w 3000.0,80% -c 5000.0,100% -p 5  
command[check_ssh]=/usr/local/nagios/libexec/check_ssh -4 127.0.0.1  
command[check_swap]=/usr/local/nagios/libexec/check_swap  -w 30% -c 10%
(注:一定要注意command的路径写对了,不对的话,页面可能会报“NRPE: Unable to read output”错误,黄色显示)


8. 验证nrpe:
重启nrpe:    
killall nrpe   
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d    
被监控机检查: /usr/local/nagios/libexec/check_nrpe -H localhost
监控主机检查: /usr/local/nagios/libexec/check_nrpe -H 被监控机IP   
成功返回nrpe版本号: NRPE v2.12


9.将被监控机器需要监控的内容添加到监控服务器nagios的配置文件中    
以标准的localhost.cfg为基础创建被监控机配置文件linuxmachine1.cfg    
cp /usr/local/nagios/etc/objects/localhost.cfg /usr/local/nagios/etc/machines/linuxmachine1.cfg
vi /usr/local/nagios/etc/machines/linuxmachine1.cfg    
内容如下(红色为需要修改的地方):
# Define a host for the  machine
define host{
        use                          linux-server            ; Name of host template to use
        host_name               linux-machine1
        alias                        linux-machine1
        address                   192.168.0.103
        }
# Define an hostgroup for Linux machines
define hostgroup{
        hostgroup_name  linux-machines-group1 ; The name of the hostgroup
        alias                   Linux Machines Group1 ; Long name of the group
        members            linux-machine1        ; Comma separated list of hosts that belong to this group
        }
# SERVICE DEFINITIONS
# Define a service to "ping" the target machine
define service{
        use                             generic-service         ; Name of service template to use
        host_name                  linux-machine1
        service_description       PING
        check_command         check_nrpe!check_ping
        }
# Define a service to check the disk space of the root partition
# Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
        use                             generic-service         ; Name of service template to use
        host_name                   linux-machine1
        service_description       Root Partition
        check_command          check_nrpe!check_disk
        }
# Define a service to check the number of currently logged in
# Warning if > 20 users, critical
# if > 50 users.
define service{
        use                             generic-service         ; Name of service template to use
        host_name                  linux-machine1
        service_description      Current Users
        check_command         check_nrpe!check_users
        }
# Define a service to check the number of currently running procs
# Warning if > 250 processes, critical if
# > 400 users.
define service{
        use                             generic-service         ; Name of service template to use
        host_name                  linux-machine1
        service_description      Total Processes
        check_command         check_nrpe!check_procs
        }
# Define a service to check the load on the machine. 
define service{
        use                             generic-service         ; Name of service template to use
        host_name                   linux-machine1
        service_description       Current Load
        check_command          check_nrpe!check_load
        }
# Define a service to check the swap usage the machine. 
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
        use                             generic-service         ; Name of service template to use
        host_name                  linux-machine1
        service_description      Swap Usage
        check_command         check_nrpe!check_swap
        }
# Define a service to check SSH on the machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
        use                             generic-service         ; Name of service template to use
        host_name                  linux-machine1
        service_description      SSH
        check_command         check_nrpe!check_ssh
        notifications_enabled    0
        }
# Define a service to check HTTP on the machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
        use                             generic-service         ; Name of service template to use
        host_name                  linux-machine1
        service_description      HTTP
        check_command         check_nrpe!check_http
        notifications_enabled   0
        }

保存退出,将该文件路径添加到nagios配置文件/usr/local/nagios/etc/nagios.cfg中
vi /usr/local/nagios/etc/nagios.cfg
添加: cfg_file=/usr/local/nagios/etc/machines/linuxmachine1.cfg


添加监听该linux-group1的用户信息
vi /usr/local/nagios/etc/objects/contacts.cfg
修改nagiosadmin信息为:
define contact{
        contact_name                    nagiosuser1             ; Short name of user
        use                              generic-contact         ; Inherit default values from generic-contact template (defined above)
        alias                            Nagios Admin            ; Full name of user
        email                            xxx@163.com       ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
        }}
修改contactgroup如下:
define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 nagiosuser1
        }
10.配置完成,验证配置有无错误    
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    
没有错误的话,重新启动nagios    
killall nagios    
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg    
查看运行状态:    /usr/local/nagios/bin/nagiostats


11.重新启动apache2,页面访问查看
service apache2 restart
访问http://nagios主机ip/nagios, 输入用户名nagiosuser1 密码,查看页面:




抱歉!评论已关闭.