Nagious
技術筆記
介紹
Document <- 點我
- Main Config : 控制Nagios Deamon的行為,這份config會被Deamon和CGIs讀
- Resource File: 儲存user-defined macro,權限660
- Object Definition File : 定義host, service, hostgroup, command, etc,決定要監控的項目和要如何監控
- CGI Config File : 讓Nagios知道main config位置以及你怎麼設定Nagios和object define的位置
Config 內容
- 主要config檔在 /usr/local/nagios/etc
- Nagios額外套件在 /usr/local/nagios/libexec
Debug
- /etc/init.d/nagios checkconfig
- journalctl -xe
安裝
- 快速安裝
curl https://assets.nagios.com/downloads/nagiosxi/install.sh | sh
- 手動安裝
cd /tmp wget http://assets.nagios.com/downloads/nagiosxi/xi-latest.tar.gz tar xzf xi-latest.tar.gz cd nagiosxi ./fullinstall
Web 顯示
- cd /etc/httpd/conf.d/
nagios.cfg
連結 <- 點我
- log_file=<file_name>
log_file=/usr/local/nagios/var/nagios.log
- 設定log file的位置,當config有錯誤時,會記錄在這裡,適用rotation
- cfg_file=<file_name>
- cfg_file=/usr/local/nagios/etc/hosts.cfg
- object config files位置
- cfg_dir=<directory_name>
- cfg_dir=/usr/local/nagios/etc/commands
- object config direction位置,其下的附檔名要是.cfg,會遞迴尋找config file
- object_cache_file=<file_name>
- object_cache_file=/usr/local/nagios/var/objects.cache
- default : ‘/dev/null’
- 當Nagios被[re]start,存一份object definition到這裡。
- 在running Nagios時,可以改object definition,而不會影響 Nagios
- 這份檔案被CGIs使用
- precached_object_file=<file_name>
- precached_object_file=/usr/local/nagios/var/objects.precache
- 預處理object definition,當object definition很多時,可以加速
- resource_file=<file_name>
- resource_file=/usr/local/nagios/etc/resource.cfg
- 放一些重要資訊,CGI不會讀這份檔案,權限設600或660
- temp_file=<file_name>
- temp_file=/usr/local/nagios/var/nagios.tmp
- Nagios在更新data時,會創建他,不用的時候會刪除他
- temp_path=<dir_name>
- temp_path=/tmp
- scratch space for creating temporary files used during the monitoring process
- status_file=<file_name>
- status_file=/usr/local/nagios/var/status.dat
- default : ‘/dev/null’
- store the current status, comment, and downtime information
- CGIs用這份檔案透過web來顯示監控狀況,需要讀取權限
- 每次stop時,這份檔案會被刪掉
- status_update_interval=< seconds >
- status_update_interval=15
- 多久更新一次status file,最短1秒
- nagios_user=<username/UID>
- nagios_user=nagios
- set the effective user that the Nagios process should run as
- nagios_group=<groupname/GID>
- nagios_group=nagios
- set the effective group that the Nagios process should run as
- host_down_disable_service_checks=<0/1>
- host_down_disable_service_checks=1
- This option will disable all service checks if the host is not in an UP state
- New config in Version 4
- enable_notifications=<0/1>
- enable_notifications=1
- Nagios will send out notifications for any host or service when it initially [re]starts
- execute_service_checks=<0/1>
- execute_service_checks=1
- If this option is disabled, Nagios will not actively execute any service checks and will remain in a sort of “sleep” mode (it can still accept passive checks unless you’ve disabled them)
- accept_passive_service_checks=<0/1>
- accept_passive_service_checks=1
- execute_host_checks=<0/1>
- execute_host_checks=1
- Nagios will execute on-demand and regularly scheduled host checks when it initially (re)starts
- If this option is disabled, Nagios will not actively execute any host checks, although it can still accept passive host checks unless you’ve disabled them
- accept_passive_host_checks=<0/1>
- accept_passive_host_checks=1
- enable_event_handlers=<0/1>
- enable_event_handlers=1
- log_rotation_method=<n/h/d/w/m>
- log_rotation_method=d
- log_current_states=<0/1>
- log_current_states=1
- Nagios will log host and service current states at the beginning of a newly created log file after log rotation occurs
- log_archive_path=< path >
- log_archive_path=/usr/local/nagios/var/archives/
- This is the directory where Nagios should place log files that have been rotated.
- External Command Check Option
- Nagios will check the command file for commands that should be executed
- This option must be enabled if you plan on using the command CGI to issue commands via the web interface
- command_file=<file_name>
- command_file=/usr/local/nagios/var/rw/nagios.cmd
- The command CGI writes commands to this file. The external command file is implemented as a named pipe
- check_for_updates=<0/1>
- check_for_updates=1
- 自動檢查Nagios有沒有new patch
- bare_update_check=<0/1>
- bare_update_check=0
Object definition
define host {
host_name 這個host的名字,其他define會用到
alias 類似註解
address IP或FQDN
parents 設定為最接近的上游設備(好像沒有用到)
check_command 檢查這個主機是否正常,若無此項Nagios會認為他是alive的
check_interval check_command的正常測試間隔,其中的單位定義在Nagios設定的interval_length
retry_interval check_command的soft status時的重試間隔,達到max_check_attempts就會變為hard status
max_check_attempts check_command的檢查重試次數
check_period 執行active check的時段
process_perf_data 是否會處理performance data
retain_status_information 是否讀取重開機之前的狀態檔,前提是全域設定的retain_state_information必須設為1
retain_nonstatus_information 是否不讀取主機狀態檔
contact_groups 要告警的人員群組,可用 “,” 分隔多個群組
notification_interval 當狀態持續發生時,兩個告警之間的時間。單位是interval_length
notification_period 會發送告警的時間
notification_options 哪幾種狀態才告警,DOWN = d,UNREACHABLE=u,回復到OK的狀態=r,flapping=f,排程關閉狀態=s,都不發送=n
}
安裝
需要工具
yum install unzip wget httpd php php-cli gd gd-devel gcc glibc glibc-common net-snmp
下載主程式
cd /usr/local/src
wget http://liquidtelecom.dl.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.2/nagios-4.3.2.tar.gz
cd nagios-4.3.2
sudo ./configure --with-command-group=nagioscmd
make all
make install
make install-init
make install-config
make install-commandmode
make install-webconf
下載插件 (/usr/local/nagios/libexec)
cd /usr/local/src
wget http://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz
tar xzf nagios-plugins-2.2.1.tar.gz
cd nagios-plugins-2.2.1
./configure --with-nagios-user=nagios --with-nagios-group=nagioscmd
make
make install
Web監控的帳號密碼
- 注意 : 帳號名稱之間不能有空白
htpasswd -c /usr/local/nagios/etc/htpasswd.users {帳號名稱}
authorized_for_system_information=...,帳號名稱
authorized_for_configuration_information=...,帳號名稱
authorized_for_system_commands=...,帳號名稱
authorized_for_all_hosts=...,帳號名稱
authorized_for_all_service_commands=...,帳號名稱
authorized_for_all_host_commands=...,帳號名稱
設定
nagios.cfg
# 設定Log存放位置
log_file=/var/spool/nagios/nagios.log
# 設定指令參數
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
# 設定聯絡人資訊
cfg_file=/usr/local/nagios/etc/objects/contactgroup.cfg
# 設定CSCC相關聯絡人/群組資訊
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/cs.cfg
cfg_file=/usr/local/nagios/etc/objects/csgroup.cfg
# 設定監控服務設定檔
cfg_file=/usr/local/nagios/etc/objects/service.cfg
# 將object cache住,避免start/restart時有inconsistencies
object_cache_file=/var/spool/nagios/objects.cache
# 下特殊參數,加速用
precached_object_file=/var/spool/nagios/objects.precache
# 相關resource位置, ex:Nagios額外套件
resource_file=/usr/local/nagios/etc/resource.cfg
# 儲存Nagios偵測結果檔案位置
status_file=/var/spool/nagios/status.dat
# 設定Nagios偵測結果狀態更新的時間區隔
status_update_interval=10
# 設定使用者/群組
nagios_user=nagios
nagios_group=nagios
# Nagios外部檢查命令功能開關,否則CGI不能用
check_external_commands=1
# 檢查時間間隔,預設15s,-1為盡可能的檢查
command_check_interval=-1
command_file=/var/spool/nagios/rw/nagios.cmd
external_command_buffer_slots=4096
# PID資訊
lock_file=/var/spool/nagios/nagios.lock
# Nagios執行時,暫存檔位置
temp_file=/var/spool/nagios/nagios.tmp
# Nagios執行時,暫存目錄位置
temp_path=/tmp
#
event_broker_options=-1
log_rotation_method=d
log_archive_path=/var/spool/nagios/archives
# 使用syslog,開啟通知
use_syslog=1
log_notifications=1
# 1 : 紀錄, 0 : 不紀錄
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
#
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0 #決定同時間執行多少個check processes
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/var/spool/nagios/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
...... 之後補上