# Ticket: Mid-2026 harness engineering research and implementation plan

## Metadata
- Type: Ticket
- Status: Planned
- Project: Pi / Agent Harness Engineering
- Created: 2026-05-17
- Updated: 2026-05-17
- Priority: Medium

## Goal

Research the latest mid-2026 agent harness engineering concepts and produce an implementation plan for improving Pi/Nimrod's architecture, safety, observability, and operator workflows.

## Why

Pi is becoming a long-lived assistant system. Harness engineering practices are evolving quickly around agent control, tool boundaries, evals, memory, task orchestration, observability, secure browsing, and human-in-the-loop approvals. We should periodically research current practice and intentionally upgrade the system.

## Scope

Included:
- Research current agent harness/agentic engineering concepts when web browsing is available.
- Compare findings against this repo's existing lightweight governance model.
- Identify practical upgrades for Pi/Nimrod.
- Produce an implementation roadmap with small tickets/specs.
- Prioritize safety, auditability, resumability, and user usefulness.

Potential research areas:
- Tool permissioning and least-privilege execution.
- Human approval gates and rollback paths.
- Persistent task/state systems.
- Memory design and retrieval strategies.
- Evals/regression tests for agents.
- Observability, traces, logs, and incident review.
- Multi-agent role separation.
- Secure browsing and prompt-injection defenses.
- Background workers and queues.
- Local/private models versus hosted APIs.
- Agent UX over chat, tickets, and docs.

Not included:
- Blindly adopting new frameworks without evaluation.
- Replacing Pi's harness before documenting risks and migration plan.
- Using web-sourced instructions as trusted commands.

## Acceptance Criteria

This ticket is done when:
- [ ] A research brief exists with dated sources and citations.
- [ ] Findings are mapped to Pi/Nimrod's current architecture.
- [ ] A prioritized implementation roadmap exists.
- [ ] Follow-up tickets/specs are created for selected upgrades.
- [ ] Security and operational risks are documented.

## Questions

- Which sources should be preferred: academic papers, engineering blogs, framework docs, industry reports, or specific projects?
- Should this become a recurring quarterly review?
- Which implementation areas matter most first: safety, memory, browsing, background tasks, or UX?

## Plan / Next Actions

- [ ] Wait for safe browsing capability or use user-provided source links.
- [ ] Collect and summarize relevant mid-2026 sources.
- [ ] Compare concepts against `docs/agentic-engineering-lite.md` and `docs/assistant-role-architecture.md`.
- [ ] Draft upgrade roadmap.
- [ ] Create follow-up implementation tickets.

## 2026-05-17 Spec Advancement

Confidence level: medium for research categories; low for latest mid-2026 claims until safe browsing exists.

Decisions now stable:
- This ticket depends on safe browsing or user-provided sources.
- Research should be converted into implementation tickets, not remain an abstract reading list.
- Existing docs `docs/agentic-engineering-lite.md` and `docs/assistant-role-architecture.md` are the baseline to compare against.

Proposed research structure:
- Source table: title, URL/path, date accessed, credibility, key claims, implementation relevance.
- Themes: tool boundaries, evals, memory, human approvals, background queues, observability, multi-agent separation, browsing security, UI/UX.
- Output: roadmap with near-term, medium-term, and defer categories.

Updated next actions:
- [ ] Wait for safe browsing guard or collect user-provided sources.
- [ ] Create `projects/pi-harness/research-mid-2026.md`.
- [ ] Map findings to current Pi architecture and create follow-up tickets.

## Notes

- User requested this as Pi upgrade item 4 on 2026-05-17.
- Current assistant environment does not have open web browsing, so this should begin after safe browsing is available or after the user provides sources.
