# Design Refinement: Safe web browsing prescan and isolated researcher

## Source Ticket

- `tickets/active/2026-05-17-safe-web-browsing-and-prompt-injection-guard.md`

## Objective

Refine the safe browsing architecture to include private SearxNG search, a deterministic pre-LLM prompt-injection scanner, and an isolated web researcher role/agent.

## Desired Flow

1. User requests web research.
2. Isolated web researcher queries private SearxNG for search result metadata.
3. User or policy selects candidate URLs.
4. Safe fetch service retrieves page content with strict network controls.
5. Sanitizer strips scripts, styles, hidden text, metadata, comments, trackers, and unsupported content.
6. Prompt-injection prescanner analyzes sanitized and suspicious stripped content before LLM exposure.
7. Guard returns a risk report:
   - source URL
   - content hash
   - content type
   - matched rules
   - weighted word/phrase score
   - suspicious excerpts
   - decision: allow, allow excerpts only, quarantine, block
8. Only allowed sanitized excerpts are passed to the LLM, clearly labeled as untrusted external content.
9. Main assistant receives a researcher report with citations, not raw browsing authority.

## Prescan Threat Signals

Initial deterministic rules should detect:
- Direct assistant-control language: `ignore previous instructions`, `system prompt`, `developer message`, `you are ChatGPT`, `act as`, `new instructions`.
- Tool/action hijacking: `run this command`, `call the tool`, `execute`, `write file`, `ssh`, `curl`, `delete`, `exfiltrate`.
- Credential/data theft: `send secrets`, `print environment`, `API key`, `token`, `password`, `.env`.
- Boundary attacks: markdown/code blocks pretending to be system messages, XML/JSON role blocks, `BEGIN SYSTEM PROMPT`, `END INSTRUCTIONS`.
- Obfuscation: base64-like blobs, zero-width characters, excessive homoglyphs, ROT13-like markers, prompt hidden in HTML comments or CSS.
- Hidden/metadata text: invisible DOM nodes, comments, alt/title attributes with assistant-targeted instructions, PDF metadata.
- Suspicious imperative density: high ratio of commands aimed at an assistant/model/tool.

## Weighted Word/Rank Scoring Sketch

Each rule contributes to a cumulative risk score:

- Critical phrases such as `ignore previous instructions`: +40
- System/developer prompt references: +25
- Tool execution or file/SSH instructions: +25
- Credential/exfiltration terms: +35
- Hidden text containing assistant-targeted language: +35
- Obfuscation/encoded payload indicators: +20
- High imperative density: +10 to +30
- Known benign documentation context can reduce score only slightly, never below quarantine when critical rules match.

Suggested thresholds:
- 0–24: allow with untrusted-content label
- 25–49: allow short excerpts only, include risk warning
- 50–79: quarantine and ask user
- 80+: block from LLM context by default

## Isolation Model

The web researcher should have:
- SearxNG query access.
- Safe fetch access.
- No shell, SSH, write, edit, secret, or infrastructure tools.
- No authenticated browser session.
- No cookies or local browser profile.
- No access to private network fetch targets.

The main assistant should receive only:
- Research summary.
- Citations/URLs.
- Guard risk reports.
- Sanitized excerpts that passed policy.

## First Implementation Slice

1. Deploy private SearxNG service.
2. Prototype scanner as a standalone script with local test corpus.
3. Build safe fetch/sanitize command-line pipeline.
4. Add tests for malicious and benign samples.
5. Only then expose a Pi extension tool.

## Open Questions

- Which language should the scanner use: TypeScript for Pi extension alignment, Python for text-processing ergonomics, or both?
- Should high-risk pages be fully blocked or stored in a quarantine file outside LLM context for user inspection?
- Should initial URL policy be allowlist-only?