Watchdog Help

How the watchdog operates, how maintenance mode works, and why the email protocol aims for safe reliability without alert fatigue.

How It Operates

  • The watchdog collector runs at regular intervals and writes JSON artifacts for hosts, services, storage, network, devices, and collector events.
  • The dashboard reads those artifacts from `/home/shared/webprojects/operations/watchdog/output` and summarizes current health.
  • Each artifact contains raw check tones, while the summary can apply maintenance-aware effective tones for operator-facing triage.

Email Safety Protocol

  • Immediate email is sent only when a new active critical appears, when the active critical set materially changes, or when a low-frequency reminder is due.
  • Recovery email is sent when active criticals clear.
  • A second monitoring path checks that collection and alert evaluation are still fresh, so silent failure of the primary alert flow is less likely to go unnoticed.
  • A weekly monitoring-health digest proves the monitoring path is alive and reminds you about maintained incidents.

Maintenance Mode

  • Maintenance mode does not hide the raw issue.
  • It keeps the underlying artifact or check tone intact, but suppresses immediate critical paging and turns the top-level summary into a warning instead of a critical.
  • Maintained incidents are still listed in the dashboard and the weekly monitoring-health email so they are not forgotten.

Why This Is Safer

  • Healthy interval-spam is avoided because repeated all-clear mail trains operators to delete alerts automatically.
  • Failure-only mail is strengthened by the second freshness monitor and weekly proof-of-life email.
  • Maintained issues remain reviewable, so maintenance mode is not a silent bypass.

Config path: /home/shared/webprojects/operations/watchdog/config/watchdog.conf. The page can edit maintenance rules here.

Critical

Current watchdog summary across generated artifacts: 8 healthy, 0 warning, 5 critical, 0 unknown.

Generated
2026-05-06 22:17
Watchdog Path
/home/shared/webprojects/operations/watchdog/output
Output Path Exists
Yes

Network & Access

Connectivity, certs, and edge dependency health.

HEALTHY
Latest Artifact
access.json
Last Updated
2026-05-06 22:15
Artifacts
1

Live Check Summary

Artifact Healthy
1
Artifact Warning
0
Artifact Critical
0
Artifact Unknown
0
Maintained Artifacts
0
Maintained Criticals
0
Checks Healthy
2
Checks Warning
0
Checks Critical
0
Checks Unknown
0
Maintained Checks
0

Operator Notes

  • Include internet reachability, DNS, VPN, TLS expiry, and endpoint checks.
  • Separate public-facing checks from internal dependency checks.

Collected Artifacts

Latest artifact details and the individual checks reported for this section.

Network & Access

Connectivity, edge access, and endpoint reachability checks.

HEALTHY
Artifact File
access.json
Updated
2026-05-06 22:15
Checks
2
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

  • Gateway HEALTHY
    Ping to 172.20.7.1 succeeded.
  • Internet HEALTHY
    Ping to 1.1.1.1 succeeded.

Summary Artifact

{
    "label": "Critical",
    "tone": "critical",
    "detail": "Watchdog collector completed with 11 artifacts, including 0 warning, 3 critical, 0 maintained critical, and 0 unknown states.",
    "updated_at": "2026-05-06T12:15:02+00:00",
    "counts": {
        "healthy": 8,
        "warning": 0,
        "critical": 3,
        "unknown": 0,
        "suppressed_critical": 0,
        "maintained": 0
    },
    "alert_items": [
        {
            "kind": "check",
            "id": "backuppc_last_success",
            "label": "BackupPC / BackupPC host homefile-data",
            "detail": "Latest completed successful backup for BackupPC host homefile-data is full #2685 from 2026-04-30 22:02 (age 6.0 days). Recent successful backups: #2685 full 2026-04-30 22:02 (6.0 days old); #2683 incr 2026-03-16 22:17 (51.0 days old); #2682 incr 2026-03-15 22:18 (52.0 days old)."
        },
        {
            "kind": "check",
            "id": "arduino-pool_freshness",
            "label": "Arduino Pool Controller / Arduino Pool Controller",
            "detail": "Freshness file not found: /home/shared/webprojects/operations/watchdog/heartbeats/arduino-pool.heartbeat"
        },
        {
            "kind": "check",
            "id": "solar-inverter_freshness",
            "label": "Solar Inverter / Solar Inverter",
            "detail": "Freshness age is 77029 minutes for /home/shared/webprojects/operations/watchdog/heartbeats/solar-inverter.heartbeat."
        }
    ],
    "maintained_items": []
}