Watchdog Help

How the watchdog operates, how maintenance mode works, and why the email protocol aims for safe reliability without alert fatigue.

How It Operates

  • The watchdog collector runs at regular intervals and writes JSON artifacts for hosts, services, storage, network, devices, and collector events.
  • The dashboard reads those artifacts from `/home/shared/webprojects/operations/watchdog/output` and summarizes current health.
  • Each artifact contains raw check tones, while the summary can apply maintenance-aware effective tones for operator-facing triage.

Email Safety Protocol

  • Immediate email is sent only when a new active critical appears, when the active critical set materially changes, or when a low-frequency reminder is due.
  • Recovery email is sent when active criticals clear.
  • A second monitoring path checks that collection and alert evaluation are still fresh, so silent failure of the primary alert flow is less likely to go unnoticed.
  • A weekly monitoring-health digest proves the monitoring path is alive and reminds you about maintained incidents.

Maintenance Mode

  • Maintenance mode does not hide the raw issue.
  • It keeps the underlying artifact or check tone intact, but suppresses immediate critical paging and turns the top-level summary into a warning instead of a critical.
  • Maintained incidents are still listed in the dashboard and the weekly monitoring-health email so they are not forgotten.

Why This Is Safer

  • Healthy interval-spam is avoided because repeated all-clear mail trains operators to delete alerts automatically.
  • Failure-only mail is strengthened by the second freshness monitor and weekly proof-of-life email.
  • Maintained issues remain reviewable, so maintenance mode is not a silent bypass.

Config path: /home/shared/webprojects/operations/watchdog/config/watchdog.conf. The page can edit maintenance rules here.

Critical

Current watchdog summary across generated artifacts: 8 healthy, 0 warning, 5 critical, 0 unknown.

Generated
2026-05-06 22:18
Watchdog Path
/home/shared/webprojects/operations/watchdog/output
Output Path Exists
Yes

Services

Functional status of web, database, mail, VPN, and application services.

HEALTHY
Latest Artifact
mariadb.json
Last Updated
2026-05-06 22:15
Artifacts
5

Live Check Summary

Artifact Healthy
5
Artifact Warning
0
Artifact Critical
0
Artifact Unknown
0
Maintained Artifacts
0
Maintained Criticals
0
Checks Healthy
7
Checks Warning
0
Checks Critical
0
Checks Unknown
0
Maintained Checks
0

Operator Notes

  • Prefer functional checks over process-only checks.
  • Include response checks for Apache, MariaDB, Exim, OpenVPN, BackupPC, and BigchainDB.

Collected Artifacts

Latest artifact details and the individual checks reported for this section.

MariaDB

MariaDB service checks.

HEALTHY
Artifact File
mariadb.json
Updated
2026-05-06 22:15
Checks
1
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

  • MariaDB HEALTHY
    TCP port 3306 is reachable on 172.20.7.22.

BackupPC

BackupPC service checks.

HEALTHY
Artifact File
backuppc.json
Updated
2026-05-06 22:15
Checks
1
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

  • BackupPC HEALTHY
    HTTP health check returned 200 for http://172.20.11.3:8080/BackupPC_Admin/.

OpenVPN

OpenVPN service checks.

HEALTHY
Artifact File
openvpn.json
Updated
2026-05-06 22:15
Checks
1
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

  • OpenVPN HEALTHY
    TCP port 1194 is reachable on 172.20.7.23.

Exim

Exim service checks.

HEALTHY
Artifact File
exim.json
Updated
2026-05-06 22:15
Checks
1
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

  • Exim HEALTHY
    TCP port 25 is reachable on 172.20.7.21.

Apache

Apache service checks.

HEALTHY
Artifact File
apache.json
Updated
2026-05-06 22:15
Checks
3
Raw File
Open raw file
Maintenance
Off
Enter Maintenance Mode

This writes a maintenance rule into the watchdog config so the issue remains visible but no longer pages as an active critical.

  • Apache HEALTHY
    apache2 service is active.
  • Apache HEALTHY
    TCP port 80 is reachable on 127.0.0.1.
  • Apache HEALTHY
    HTTP health check returned 200 for http://127.0.0.1/.

Summary Artifact

{
    "label": "Critical",
    "tone": "critical",
    "detail": "Watchdog collector completed with 11 artifacts, including 0 warning, 3 critical, 0 maintained critical, and 0 unknown states.",
    "updated_at": "2026-05-06T12:15:02+00:00",
    "counts": {
        "healthy": 8,
        "warning": 0,
        "critical": 3,
        "unknown": 0,
        "suppressed_critical": 0,
        "maintained": 0
    },
    "alert_items": [
        {
            "kind": "check",
            "id": "backuppc_last_success",
            "label": "BackupPC / BackupPC host homefile-data",
            "detail": "Latest completed successful backup for BackupPC host homefile-data is full #2685 from 2026-04-30 22:02 (age 6.0 days). Recent successful backups: #2685 full 2026-04-30 22:02 (6.0 days old); #2683 incr 2026-03-16 22:17 (51.0 days old); #2682 incr 2026-03-15 22:18 (52.0 days old)."
        },
        {
            "kind": "check",
            "id": "arduino-pool_freshness",
            "label": "Arduino Pool Controller / Arduino Pool Controller",
            "detail": "Freshness file not found: /home/shared/webprojects/operations/watchdog/heartbeats/arduino-pool.heartbeat"
        },
        {
            "kind": "check",
            "id": "solar-inverter_freshness",
            "label": "Solar Inverter / Solar Inverter",
            "detail": "Freshness age is 77029 minutes for /home/shared/webprojects/operations/watchdog/heartbeats/solar-inverter.heartbeat."
        }
    ],
    "maintained_items": []
}