# DevOps QRSPI Framework

## Purpose

Adapt Nimrod's QRSPI-style software workflow to infrastructure and operations work where changes affect shared, stateful systems such as Proxmox, DNS, TLS, backups, monitoring, and service dashboards.

## Core Principle

Multiple agents may research and plan in parallel, but writes to shared infrastructure must be coordinated through explicit ownership, locks, source-of-truth registries, verification, and rollback notes.

## Phase Mapping

| QRSPI Phase | DevOps Equivalent | Output |
|---|---|---|
| Question | Clarify the requested infrastructure outcome, risks, ownership, and dependencies | Ticket questions |
| Research | Inspect live state and existing docs without mutating systems | Research artifact / inventory notes |
| Structure | Break work into safe, testable slices | Implementation slices |
| Plan | Produce preflight checklist, lock needs, rollback path, verification commands | Change plan |
| Implement | Execute only the approved slice while holding needed lock(s) | Infrastructure change |
| Verify / Record | Confirm actual state, update registry, ticket, runbook, and change log | Review artifact + docs updates |

## Required Controls for Shared Infrastructure

### 1. Mutation Locks

Use lock files under `state/locks/` before modifying shared systems.

For Proxmox mutation work, use:

```text
state/locks/proxmox.lock
```

Only one write-capable agent should hold a Proxmox mutation lock at a time. Read-only inventory/research may proceed in parallel.

### 2. Infrastructure Registry

Use `infra/proxmox-registry.yaml` as the local source of truth for planned/known VM IDs, names, IPs, owners, service purpose, and lifecycle status.

Agents must check both:

1. The registry in this repo
2. The live Proxmox state

before allocating VM IDs, IPs, hostnames, or storage.

### 3. Managed VM/LXC Templates

Created and managed VMs/LXCs should follow repeatable templates covering at least:

- Naming and VMID/LXC ID allocation
- DNS registration expectations
- TLS/certificate expectations
- Backup policy and retention class
- Monitoring/dashboard registration
- Update schedule and patching owner
- SSH/admin access model
- Secrets handling
- Logging and verification commands
- Rollback/delete procedure

Until automation exists, these templates may be markdown checklists. Later they should become Ansible roles, Terraform/OpenTofu modules, Proxmox templates, cloud-init snippets, or equivalent.

### 4. DevOps/Software Crossover Rule

Some tasks are hybrid:

- DevOps task requiring software development: e.g. build a dashboard registration tool, Pi extension, provisioning script, health checker, or API integration.
- Software task requiring DevOps: e.g. deploy a web app, provision database, configure TLS, create backup/restore process.

Hybrid tasks should be split into linked tickets/specs when the risks differ.

Use this rule of thumb:

- Code changes follow software QRSPI: tests, diffs, review, commits.
- Infrastructure changes follow DevOps QRSPI: locks, registry, preflight, live verification, rollback, server change log.
- A hybrid task is not done until both sides meet their acceptance criteria.

## Standard DevOps Preflight Checklist

Before any infrastructure mutation:

- [ ] Ticket exists with scope and acceptance criteria
- [ ] Spec or plan exists for non-trivial/security/data-affecting work
- [ ] Live state inspected
- [ ] Registry checked
- [ ] VMID/LXC ID available, if applicable
- [ ] IP/DNS name available, if applicable
- [ ] Storage and compute capacity checked
- [ ] Backup/snapshot/rollback approach identified
- [ ] Required mutation lock acquired
- [ ] User approval obtained for destructive, externally exposed, or security-sensitive changes

## Standard DevOps Completion Checklist

After implementation:

- [ ] Intertwined tickets/dependencies are documented, especially temporary local work that overlaps with a broader platform ticket
- [ ] Independent reviewer/verifier pass checks meaningful infrastructure changes before the ticket is marked done
- [ ] Service/VM/LXC reachable as expected
- [ ] DNS/TLS status documented
- [ ] Backup policy documented or explicitly deferred
- [ ] Dashboard/monitoring registration completed or ticketed
- [ ] Update schedule documented or ticketed
- [ ] Registry updated
- [ ] Relevant runbook updated
- [ ] `docs/server-change-log.md` updated for server-side operational changes
- [ ] Rollback notes recorded
