# Service Backup Standard

## Purpose

Define minimum backup and restore expectations for managed homelab services so new services do not become fragile snowflakes.

This standard complements:

- `docs/templates/vm-lxc-service-template.md`
- service runbooks
- `infra/proxmox-registry.yaml`
- `docs/server-change-log.md`

Proxmox snapshots are useful rollback tools, but snapshots alone are not sufficient for data-bearing or configuration-bearing services.

## Backup Classes

### Experimental

For disposable trials or services with no important unique data.

Minimum expectations:

- Document backup class in the service template, registry, or runbook.
- Take a Proxmox snapshot before risky changes when practical.
- No scheduled application backup is required.
- Rebuild or redeploy notes should exist if the service is kept.

### Standard

For useful managed services that can be rebuilt but have configuration worth preserving.

Minimum expectations:

- Back up service configuration and deployment definitions.
- Prefer application/config-level backup over relying only on VM/LXC snapshots.
- Keep at least one off-guest copy when practical.
- Encrypt backups if they contain credentials, tokens, private keys, private URLs, user data, or sensitive configuration.
- Document restore steps in the service runbook.
- Perform a restore or rebuild test after initial backup setup and after meaningful backup-process changes.

Examples: Homepage, SearXNG, reverse proxy, internal DNS.

### Critical

For services where data loss or unrecoverability would materially harm operations.

Minimum expectations:

- Application-level backup is required.
- Off-guest copy is required.
- Offline or user-controlled copy is required where practical.
- Encryption is required for sensitive data.
- Recovery material must not live only inside the service being backed up.
- Restore test is required before treating the service as authoritative.
- Backup failures block risky migrations or increased dependency until resolved.
- Backup verification and restore-test evidence must be logged.

Examples: Vaultwarden, authoritative Nextcloud data, future authoritative task/calendar/document stores.

### Manual-only

For services where automated backup is not yet approved or where manual operation is intentionally safer during bootstrap.

Minimum expectations:

- Manual backup command/checklist is documented.
- Destination, encryption method, and recovery-material ownership are documented by reference only.
- Last successful manual backup is recorded.
- Follow-up ticket is required if the service is standard or critical long-term.

## Service-Type Defaults

| Service type | Default class | Backup scope | Notes |
|---|---|---|---|
| Secrets vault | Critical | database/data directory, config, attachments if enabled, service definitions, backup scripts/config | Must be encrypted, off-guest, and restore-tested before authoritative use. |
| Config-as-code dashboard | Standard | dashboard config directory, compose/service definition, DNS/proxy references by documentation link | No secrets should be stored in dashboard config. |
| Reverse proxy | Standard; critical if many critical services depend on it | Nginx sites/snippets, TLS/renewal references, service definitions, routing inventory | Backup private keys only via approved encrypted path; never commit them. |
| Internal DNS | Standard; critical if clients depend on it exclusively | resolver config, local records/zones, upstream policy, service definition, router/Tailscale integration notes | Verify representative records after restore. |
| Stateless/customized web app | Standard if customized; experimental if disposable | compose/service definition, app config, custom settings, proxy/DNS references | Add data/database backup only if app stores unique state. |
| Assistant workspace/build host | Standard or critical depending on unique state | repo clone/remotes, ignored operational config references, local-only scripts/config, SSH/access references | Git does not cover ignored operational state. |

## Snapshot vs Application-Level Backup

Use Proxmox snapshots for:

- pre-update rollback
- quick recovery from failed configuration changes
- known-good bootstrap states

Do not use snapshots as the only backup for:

- secrets vaults
- databases
- user files
- authoritative service configuration
- anything requiring off-host/offline recovery

Application-level or config-level backup should be the durable recovery path.

## Encryption and Recovery Material

Encrypt backups when they contain:

- secrets, credentials, tokens, or private keys
- private service configuration
- user data
- sensitive logs or exports

Rules:

- Do not store encryption private keys, passwords, recovery codes, or tokens in git.
- Record locations/procedures by reference only.
- Recovery material for a critical service must be accessible without relying on that same service.
- Vaultwarden's current `age`-encrypted backup model is the reference critical-service example.

## Restore-Test Expectations

Minimum verification for any backup:

- backup command exits successfully
- expected artifact exists
- artifact is non-empty
- checksum/integrity check passes where available
- artifact is outside the source data directory
- off-guest copy is verified if required by class

Restore-test expectations by class:

- Experimental: no formal restore test required; rebuild notes are acceptable.
- Standard: restore/rebuild test after initial setup or major backup-process changes.
- Critical: isolated restore test required before authoritative use and after major changes.
- Manual-only: manual restore path must be documented before promotion to standard/critical.

Restore tests should use an isolated target when practical and must not overwrite production data.

## Retention and Offline Destination Policy

### Backup Destination Terms

- Guest-local: backup remains on the source VM/LXC, for example `/var/backups/<service>/`. This is useful for quick rollback but does not satisfy off-guest requirements.
- Off-guest: backup is copied to another managed host, currently Nimrod LXC 104 under `/home/piagent/backups/<service>/`. This protects against source guest loss but is still online infrastructure.
- Offline / user-controlled: backup is copied to storage controlled outside the homelab service stack, such as the user's workstation, removable media, or another user-managed offline location. This is required for critical recovery where practical.

### Retention Defaults

During bootstrap, services may use manual/no-pruning retention while backup scripts and restore tests are being proven. This is temporary and should be replaced before scheduled automation is treated as complete.

Default retention once automated or regular manual backups exist:

| Class | Guest-local retention | Off-guest retention | Offline/user-controlled retention |
|---|---:|---:|---|
| Experimental | optional latest only | not required | not required |
| Standard | latest 3 artifacts or 14 days | latest 5 artifacts or 30 days | optional; recommended after major known-good config milestones |
| Critical | latest 7 artifacts or 30 days | latest 14 artifacts or 60 days | required where practical; keep at least latest known-good plus one prior known-good copy |
| Manual-only | no automatic pruning until class is finalized | no automatic pruning until class is finalized | follow service-specific checklist |

Retention may be increased for services with high change risk, low artifact size, or difficult rebuild paths.

### Current Bootstrap Policy

For current bootstrap services:

- Homepage, Unbound, and reverse proxy are `standard`; their guest-local and Nimrod off-guest copies satisfy bootstrap off-guest expectations.
- SearXNG is `standard` and uses a sanitized config backup by user direction; its service `secret_key` is redacted from backups and must be regenerated on restore, so no age identity is required for SearXNG.
- Vaultwarden is `critical`; Nimrod off-guest copy is necessary but not sufficient. The user must maintain an offline/user-controlled encrypted copy and recovery key outside Vaultwarden before critical secrets are migrated.
- TLS private keys, tokens, provider credentials, and vault recovery material must not be added to unencrypted standard backups.

### Offline Copy Expectations

- Standard services do not require routine offline copies unless they become operationally critical or include sensitive material.
- Critical services require offline/user-controlled recovery material where practical.
- Recovery material for a critical service must not depend on the service being recovered.
- Offline copy procedures should record locations by reference only, not secrets or key values.

### Implementation Notes

Initial implementation can proceed without credentials, router changes, or new external services:

1. Keep existing manual/no-pruning bootstrap backups until pruning is implemented per-service.
2. Add pruning logic to backup scripts using artifact count first; avoid deleting unknown/manual artifacts until naming is consistent.
3. Apply `standard` retention to Homepage, Unbound, reverse proxy, and SearXNG.
4. Apply `critical` retention to Vaultwarden after restore testing and user confirmation of offline encrypted copy ownership.
5. Log retention activation and any restore/decrypt test evidence in `docs/server-change-log.md`.

## Documentation Requirements

Each managed service should record these fields in its template/runbook and registry entry where applicable:

- Backup class
- Backup scope
- Backup destination
- Encryption method/reference
- Schedule
- Retention
- Last backup verified
- Restore test required
- Last restore test
- Recovery material owner/location reference only

Operational backup changes and restore tests must be logged in `docs/server-change-log.md`.

## Current Initial Trial

LXC 107 `homepage` is the first low-risk standard-class trial because it is useful, non-critical, file-configured, and should not contain secrets. Its service runbook records the active backup command, destination, and verification evidence.
