Watchdog Help

How the watchdog operates, how maintenance mode works, and why the email protocol aims for safe reliability without alert fatigue.

How It Operates

  • The watchdog collector runs at regular intervals and writes JSON artifacts for hosts, services, storage, network, devices, and collector events.
  • The dashboard reads those artifacts from `/home/shared/webprojects/operations/watchdog/output` and summarizes current health.
  • Each artifact contains raw check tones, while the summary can apply maintenance-aware effective tones for operator-facing triage.

Email Safety Protocol

  • Immediate email is sent only when a new active critical appears, when the active critical set materially changes, or when a low-frequency reminder is due.
  • Recovery email is sent when active criticals clear.
  • A second monitoring path checks that collection and alert evaluation are still fresh, so silent failure of the primary alert flow is less likely to go unnoticed.
  • A weekly monitoring-health digest proves the monitoring path is alive and reminds you about maintained incidents.

Maintenance Mode

  • Maintenance mode does not hide the raw issue.
  • It keeps the underlying artifact or check tone intact, but suppresses immediate critical paging and turns the top-level summary into a warning instead of a critical.
  • Maintained incidents are still listed in the dashboard and the weekly monitoring-health email so they are not forgotten.

Why This Is Safer

  • Healthy interval-spam is avoided because repeated all-clear mail trains operators to delete alerts automatically.
  • Failure-only mail is strengthened by the second freshness monitor and weekly proof-of-life email.
  • Maintained issues remain reviewable, so maintenance mode is not a silent bypass.

Config path: /home/shared/webprojects/operations/watchdog/config/watchdog.conf. The page can edit maintenance rules here.

Critical

Current watchdog summary across generated artifacts: 8 healthy, 0 warning, 5 critical, 0 unknown.

Generated
2026-05-06 22:22
Watchdog Path
/home/shared/webprojects/operations/watchdog/output
Output Path Exists
Yes

Events / Raw

Recent alerts, collector stderr, and raw drill-down output.

CRITICAL
Latest Artifact
collector.json
Last Updated
2026-05-06 22:15
Artifacts
1

Live Check Summary

Artifact Healthy
0
Artifact Warning
0
Artifact Critical
1
Artifact Unknown
0
Maintained Artifacts
0
Maintained Criticals
0
Checks Healthy
0
Checks Warning
0
Checks Critical
0
Checks Unknown
0
Maintained Checks
0

Operator Notes

  • Store recent alert history and collector failures here for operator drill-down.
  • Use this tab for raw outputs rather than overloading the overview tab.

Collected Artifacts

Latest artifact details and the individual checks reported for this section.

Collector

Latest watchdog collector run metadata.

CRITICAL
Artifact File
collector.json
Updated
2026-05-06 22:15
Checks
0
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

Generated Files

Embedded Summary

{
    "label": "Critical",
    "tone": "critical",
    "detail": "Watchdog collector completed with 11 artifacts, including 0 warning, 3 critical, 0 maintained critical, and 0 unknown states.",
    "updated_at": "2026-05-06T12:15:02+00:00",
    "counts": {
        "healthy": 8,
        "warning": 0,
        "critical": 3,
        "unknown": 0,
        "suppressed_critical": 0,
        "maintained": 0
    },
    "alert_items": [
        {
            "kind": "check",
            "id": "backuppc_last_success",
            "label": "BackupPC / BackupPC host homefile-data",
            "detail": "Latest completed successful backup for BackupPC host homefile-data is full #2685 from 2026-04-30 22:02 (age 6.0 days). Recent successful backups: #2685 full 2026-04-30 22:02 (6.0 days old); #2683 incr 2026-03-16 22:17 (51.0 days old); #2682 incr 2026-03-15 22:18 (52.0 days old)."
        },
        {
            "kind": "check",
            "id": "arduino-pool_freshness",
            "label": "Arduino Pool Controller / Arduino Pool Controller",
            "detail": "Freshness file not found: /home/shared/webprojects/operations/watchdog/heartbeats/arduino-pool.heartbeat"
        },
        {
            "kind": "check",
            "id": "solar-inverter_freshness",
            "label": "Solar Inverter / Solar Inverter",
            "detail": "Freshness age is 77029 minutes for /home/shared/webprojects/operations/watchdog/heartbeats/solar-inverter.heartbeat."
        }
    ],
    "maintained_items": []
}

Summary Artifact

{
    "label": "Critical",
    "tone": "critical",
    "detail": "Watchdog collector completed with 11 artifacts, including 0 warning, 3 critical, 0 maintained critical, and 0 unknown states.",
    "updated_at": "2026-05-06T12:15:02+00:00",
    "counts": {
        "healthy": 8,
        "warning": 0,
        "critical": 3,
        "unknown": 0,
        "suppressed_critical": 0,
        "maintained": 0
    },
    "alert_items": [
        {
            "kind": "check",
            "id": "backuppc_last_success",
            "label": "BackupPC / BackupPC host homefile-data",
            "detail": "Latest completed successful backup for BackupPC host homefile-data is full #2685 from 2026-04-30 22:02 (age 6.0 days). Recent successful backups: #2685 full 2026-04-30 22:02 (6.0 days old); #2683 incr 2026-03-16 22:17 (51.0 days old); #2682 incr 2026-03-15 22:18 (52.0 days old)."
        },
        {
            "kind": "check",
            "id": "arduino-pool_freshness",
            "label": "Arduino Pool Controller / Arduino Pool Controller",
            "detail": "Freshness file not found: /home/shared/webprojects/operations/watchdog/heartbeats/arduino-pool.heartbeat"
        },
        {
            "kind": "check",
            "id": "solar-inverter_freshness",
            "label": "Solar Inverter / Solar Inverter",
            "detail": "Freshness age is 77029 minutes for /home/shared/webprojects/operations/watchdog/heartbeats/solar-inverter.heartbeat."
        }
    ],
    "maintained_items": []
}