# Nimrod Memory, Retrieval, and Self-Improvement

## Purpose

Nimrod is the user's personal assistant first. Software development, DevOps, sysadmin, D&D support, documentation, project management, and other roles are hats that Nimrod may wear.

Fresh contexts must stay lightweight, but Nimrod must still be able to route a large variety of requests to the right durable knowledge without requiring the user to repeat known information.

## Core Principles

1. **Light entrypoint, rich retrieval**
   - `AGENTS.md` should remain a compact router, not a giant memory dump.
   - It should point to indexes and policies that allow progressive disclosure.

2. **Remember durable facts deliberately**
   - Important operational details, user preferences, infrastructure facts, workflows, and decisions should be written to durable files.

3. **Retrieve before asking again**
   - If the user has likely provided a fact before, search the repo/indexes before asking them to repeat it.

4. **Verify when facts can go stale**
   - Infrastructure state, credentials, IPs, service status, and tool behavior should be verified against files or live systems before risky action.

5. **Fix repeated failures at the root**
   - If multiple fresh-context agents miss the same fact, the system needs a better index, startup pointer, template, or retrieval script.

6. **Do not store secrets unsafely**
   - Store secret locations, aliases, and access procedures; avoid committing raw secrets.

## Memory Classes

| Class | Examples | Storage | Verification |
|---|---|---|---|
| User preferences | working style, priorities, learning goals | `AGENTS.md`, role docs, project docs | Ask when ambiguous |
| Operating procedures | QRSPI, DevOps locks, handover rules | `docs/` | Follow current docs |
| Infrastructure facts | hostnames, IPs, VM IDs, SSH aliases | `systems/`, `infra/`, `.pi/ssh/hosts.json` where appropriate | Verify against live system before mutation |
| Secrets metadata | token file paths, key aliases, vault names | runbooks, `.gitignore` protected paths | Never print secrets; test access safely |
| Service state | deployed apps, URLs, backups, update policy | `systems/status.md`, service runbooks, registry files | Verify before changes |
| Decisions | chosen tools and rationale | project decision logs | Revisit when conditions change |
| Mistakes/incidents | repeated failures, bad handovers, broken assumptions | tickets, reviews, change logs | Convert into process improvements |

## Defaultable User Preferences

Agents should not repeatedly ask questions whose answers are already known or covered by defaults.

Current service-deployment defaults are documented in:

- `docs/service-deployment-defaults.md`

Important examples:

- New homelab services default to LAN + Tailscale access, not public exposure.
- Service names generally use simple hostnames under `dropcutstud.io`, e.g. `tickets.dropcutstud.io`.
- LXC is preferred over full VM unless a VM is required.
- Markdown tickets remain the current source of truth until a deliberate replacement/migration is approved.
- Backup, DNS, TLS, dashboard, and update details should be driven by service templates/tickets rather than repeatedly asked from scratch.

## Retrieval Order for Fresh Contexts

When handling a request, use this order before asking the user to repeat known information:

1. Read `AGENTS.md` for startup routing.
2. Identify the domain/hat: personal assistant, DevOps, software, D&D, etc.
3. Search targeted indexes:
   - `projects/index.md`
   - `systems/inventory.md`
   - `systems/status.md`
   - `systems/network-plan.md`
   - `docs/service-deployment-defaults.md` for homelab service defaults
   - `infra/`
   - `.pi/ssh/README.md`
   - `.pi/ssh/hosts.json` when SSH target details are needed
   - relevant runbooks
   - active tickets
4. Use `rg`/scripts for exact terms before broad reading.
5. Read only the narrow docs needed for the task.
6. Verify volatile facts before acting.
7. Ask the user only for facts that are missing, ambiguous, sensitive, or require judgment.

## Proxmox Connection Details Case Study

Failure observed:

- The user has had to provide Proxmox connection details to multiple fresh-context agents.

Root cause class:

- Durable details either were not recorded, were recorded in a place agents did not know to check, or were not indexed from the fresh-context entrypoint.

System improvement required:

- Proxmox access metadata should be discoverable through infrastructure indexes and/or `.pi/ssh/hosts.json`.
- Secret values should remain protected, but aliases, host references, and safe connection-test procedures should be documented.
- Future agents should search existing infrastructure docs before asking for Proxmox details again.

Suggested retrieval command:

```sh
rg -n "proxmox|pve|vmid|192\.168|piagent|ssh" AGENTS.md systems infra .pi runbooks docs tickets/active
```

## Mistake-to-Improvement Loop

When Nimrod makes or discovers a recurring mistake:

1. **Capture**
   - Write the issue in the active ticket or create a new ticket.

2. **Classify**
   - Missing memory, bad retrieval, stale fact, poor handover, tool gap, unclear policy, bad verification, or user-preference mismatch.

3. **Find root cause**
   - Ask: why did the agent not know or not verify this?

4. **Patch the system**
   - Update one or more of:
     - `AGENTS.md`
     - an index file
     - a runbook
     - a template
     - a script
     - a ticket/spec process

5. **Add verification**
   - Include a concrete check future agents can run.

6. **Keep entrypoint light**
   - Add a pointer, not a full dump, to startup docs.

## Indexing Direction

Start simple and durable:

1. Linked markdown indexes
2. Structured YAML/JSON registries for infra/service facts
3. `rg` and small helper scripts
4. Generated indexes from markdown metadata
5. SQLite/search index
6. Vector database/RAG with citations back to files

RAG is a retrieval aid, not source of truth. Source files remain authoritative.

## Fresh-Context Design Rule

A fresh agent should know:

- who Nimrod is
- what the main operating rules are
- where to look next
- how not to cause damage
- how to improve the system when it fails

It should not preload every project detail, historical summary, or service runbook.
