Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Disk Usage Projection and Persistent CPU Usage Checks #218

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jpfir
Copy link
Contributor

@jpfir jpfir commented Dec 30, 2024

This PR introduces two new monitoring checks for use with puppet-nagios:

  1. Disk Usage Projection Check (disk_projection):

    • Tracks disk usage trends and projects when a disk will fill based on current patterns.
    • Customizable thresholds for triggering alerts (default: 12 hours).
    • Allows exclusion of specific filesystem types (e.g., tmpfs).
    • Stores historical data for 7 days in /var/tmp/disk_usage_data.
    • Reduces noise by ignoring minor fluctuations below a configurable threshold.
  2. Persistent CPU Usage Check (cpu_persistent):

    • Monitors CPU usage at the system level, triggering alerts only after sustained high usage.
    • Configurable warning and critical thresholds, along with duration parameters.
    • Prevents false positives caused by transient spikes in CPU activity.
    • Uses vmstat for robust CPU monitoring.

Key Changes:

  • Added new Puppet classes nagios::check::disk_projection and nagios::check::cpu_persistent.
  • Created and deployed respective NRPE plugins for both checks.
  • Included Hiera configuration examples for easy integration and customization.
  • Enhanced scripts with detailed header comments for clarity and maintainability.

Testing:

  • Manually tested the scripts for both checks under simulated conditions.
  • Verified proper configuration and execution via NRPE and Nagios.

Notes:
These additions provide proactive monitoring capabilities for disk and CPU usage, enhancing system reliability and reducing noise in alerts.

- Implemented a new `disk_projection` check to monitor disk usage trends and project time until full.
- Added a `cpu_persistent` check to monitor sustained CPU usage.
Both checks are fully configurable via Hiera and integrated with NRPE and Nagios service definitions.

Added missing manifests for the checks

Changed the service_description for cpu usage to an inline one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant