> Archived status: this project is dead in the water for now and retained for reference.

# Spec: YouTube-to-Obsidian knowledge capture

## Metadata
- Type: Spec
- Status: Draft
- Project: YouTube-to-Obsidian Knowledge Capture
- Created: 2026-05-14
- Updated: 2026-05-14

## Objective

Build a workflow that lets the user capture useful ideas from YouTube videos watched at work, often offline through YouTube Premium, then later turn those captures into organized notes in the user's Obsidian vault.

## Background

The user watches many YouTube videos at work on an Android tablet, commonly using YouTube Premium offline downloads. They sometimes want to preserve concepts, subjects, timestamps, and video references for later review. At work, the user can listen/watch but has limited opportunity to interact with the device and may have no reception. The assistant currently does not have access to the user's Obsidian vault.

Ideal user experience:
- While watching, quickly record a short local voice memo around the time the video played, saying something like: “this video was interesting; I want to consider/watch/expand on it later.”
- If possible, mention the video title, channel/author, or subject in the voice memo.
- Prefer a local Android voice recorder that saves files offline and automatically uploads/syncs to Nextcloud once connection returns. Nextcloud Talk was tested and is not suitable because it cannot send messages or leave self-notes without a Nextcloud connection.
- Later, synchronize the capture time with the watched video/time position where possible.
- Save selected, user-facing video metadata, concepts, and subjects into Obsidian.
- Keep most raw logs, processing records, audio files, transcripts, matching state, and debug output outside the Obsidian vault.
- Create a daily YouTube note only for days where YouTube videos are watched/processed, plus one note per especially interesting video when warranted, and connect/integrate concepts with existing vault notes when vault access exists.

Important uncertainty: YouTube may not expose reliable viewing data, watch progress, or offline watch history in a way that can be used programmatically. Initial user test: offline-watched downloaded videos on the Android tablet did later appear in YouTube history on the PC after reconnect/sync, but the YouTube website did not show watch timestamps. Google My Activity does show a single timestamp for watched videos, probably the video start time, but it still needs testing to confirm whether this is the actual offline watch time or the later tablet resync time. The design should not depend on unavailable/private YouTube offline data unless verified. A YouTube API key alone only accesses public API data; private watch/history data, if available at all, would require OAuth/account authorization. Google Takeout itself does not normally provide a simple API-key workflow; it is primarily a user-initiated export. Google Data Portability APIs may exist for some export categories, but would require OAuth/app setup and must be verified for YouTube history before relying on them. For this project, investigate manual Takeout first.

## Requirements

Must have:
- Capture a quick note while or shortly after watching a video from an Android tablet.
- Minimize interaction required while the user is at work.
- Preserve enough context to identify the video later.
- Support offline capture or degraded offline operation.
- Create or stage Markdown notes suitable for Obsidian.
- Keep operational/process data outside the Obsidian vault by default.
- Avoid requiring assistant access to the Obsidian vault until the user explicitly grants and documents it.
- Keep the workflow simple enough to use at work without interrupting the user.

Nice to have:
- Automatically infer video title/channel/URL from Talk voice-message transcript, YouTube history, share links, browser history, or exported data.
- Timestamp alignment between capture time and video playback time.
- Use local Android voice recordings as the preferred capture inbox, automatically uploaded to Nextcloud when connection returns. Nextcloud Talk/Notes are not suitable as the primary offline capture surface if they require live server connectivity.
- Extract transcript/chapters when available.
- Summarize concepts and create backlinks/tags for Obsidian.
- Maintain a queue of unprocessed captures.
- Produce a daily YouTube note for days with watched videos, including videos the user did not explicitly mention.
- Support later “chat/revisit this topic/video” sessions based on watched-video logs and generated notes.
- Maintain an Obsidian-facing watched-video index/log only where useful; detailed processing status should remain in server-side staging/logs outside the vault.

Out of scope for the first version:
- Circumventing YouTube Premium DRM or offline storage protections.
- Building a custom app to play YouTube Premium offline downloads outside the official YouTube app.
- Downloading or redistributing YouTube video/audio.
- Assuming access to employer devices or accounts without permission.
- Processing YouTube Shorts by default. Shorts should usually be ignored unless the user specifically discusses/flags one in a capture.
- Fully automatic capture from YouTube offline viewing until feasibility is verified.

## Users / Stakeholders

- Primary user: Deeso
- Assistant roles: knowledge-management assistant, workflow designer, light automation engineer

## Proposed Design

Use a phased approach.

### Phase 0: Clarify and validate

Document:
- Devices used for YouTube watching.
- Whether captures must be made on the same device.
- Whether Nextcloud mobile apps support the needed offline capture behavior.
- Whether YouTube watch history includes offline Premium viewing and approximate timestamps.
- Where the Obsidian vault lives and how notes should be imported.

### Phase 1: Manual reliable capture

Create a low-friction capture format that works even if YouTube data is unavailable and requires minimal work interaction.

Example full capture message when interaction is possible:

```text
ytcap
video: <title, URL, or rough description>
time: <video timestamp if known, e.g. 12:34>
note: <idea to save>
tags: <optional subjects>
```

Example minimal capture message when the user is busy at work:

```text
ytcap interesting
```

The system should use the message timestamp as the primary clue and reconcile it later with YouTube history/offline sync/manual review.

If the video URL is available via YouTube share, the user can include it. If not, the title/channel/search terms are enough to reconcile later.

### Phase 2: Nextcloud capture inbox

Evaluate Nextcloud options:
- Local Android voice recorder memo, saved with device timestamp, then uploaded/synced to a dedicated Nextcloud capture folder.
- Nextcloud app auto-upload, FolderSync/WebDAV, Syncthing, or another folder sync tool as the upload mechanism.
- Deck card or task.
- Form submission if mobile/offline behavior is acceptable.

The capture inbox should produce a processable record with:
- capture timestamp
- user-entered text
- source application/channel
- sync status

### Phase 2.5: Automated YouTube activity import

Investigate Google Data Portability as the preferred official source for automated/background YouTube My Activity/watch-history import. The target output is normalized watched-video records in server-side staging, for example:

```json
{
  "watched_at": "2026-05-15T04:18:00Z",
  "video_id": "...",
  "url": "https://www.youtube.com/watch?v=...",
  "title": "...",
  "channel": "...",
  "source": "google_data_portability",
  "is_short": false
}
```

The system should use this source, if feasible, to populate daily YouTube notes and match voice captures. Manual Google Takeout should remain a fallback/debug option rather than the normal workflow.

### Phase 3: Processing pipeline

A script or assistant workflow reads new captures and creates staged Markdown files, for example:

```text
obsidian-staging/youtube/YYYY-MM-DD-video-title.md
```

Each video note can include:
- video title/channel/URL if known
- capture timestamp
- video timestamp if known
- captured thought
- inferred concepts/subjects
- processing status
- links/backlinks once vault structure is known
- source daily capture link

Daily YouTube notes can include:
- one note per day only when YouTube videos were watched/processed
- each non-Short video watched that day, from My Activity/Takeout/manual export where available
- author/channel
- video title and URL
- thumbnail/image link or embedded thumbnail where Obsidian policy allows
- caption/transcript availability and selected caption-derived summary
- short assistant summary for each video
- highlighted videos with associated user voice captures/transcripts
- unmentioned videos with brief metadata and optional summaries
- prompts/questions for later discussion

Detailed raw data that should stay outside Obsidian:
- raw audio files
- raw Whisper transcripts, except selected/verbatim `## User capture` excerpts
- raw YouTube captions/transcripts, except selected source excerpts or summaries
- JSON processing records
- matching/debug logs
- checksums, retry state, and daemon logs

### Phase 4: Obsidian integration

After the user grants access or chooses a sync method, import staged notes into the vault using an agreed folder and template. The vault is currently on the user's PC and should be synced with Nextcloud as soon as practical. The preferred note model is one note per interesting video plus a daily capture note, with concept integration/backlinks where appropriate.


## Obsidian Output Policy

The Obsidian vault should receive curated, user-facing knowledge notes rather than becoming the processing database. Most machine logs and raw artifacts should remain in server-side staging.

### Keep outside Obsidian by default

- raw Android audio recordings
- full raw Whisper transcripts
- full raw YouTube captions/transcripts
- JSON records, checksums, queue state, retry state, and daemon logs
- confidence/debug traces for matching
- failed-file diagnostics

### Allowed in Obsidian

- daily YouTube notes for days where videos were watched/processed
- short summaries of each relevant non-Short video
- author/channel, title, URL, duration, and thumbnail reference
- selected caption excerpts when useful and clearly marked as source material
- user voice-capture excerpts/transcripts when clearly marked as `User capture`
- assistant-generated summaries/analysis when clearly marked
- links to external/staged raw artifacts if useful, not bulk raw data

### Shorts policy

YouTube Shorts should usually be ignored by the system to reduce noise. A Short should be processed only when:

- the user explicitly records a voice capture about it,
- the user asks to include Shorts for a specific day/topic, or
- a later review process identifies it as unusually relevant and asks for confirmation.

## Server-Side Processing Architecture

Proposed pipeline:

1. **Capture upload**
   - Android records local voice memo offline.
   - Recorder folder syncs to Nextcloud when connectivity returns.
   - Target folder example: `Inbox/YouTube Voice Captures/`.

2. **Ingestion watcher**
   - A server-side watcher or scheduled job scans the Nextcloud capture folder for new audio files.
   - New files are copied to a processing/staging area.
   - Each file gets an immutable capture record containing filename, upload time, file modified time, checksum, and processing status.

3. **Transcription**
   - Pass audio to Whisper on the Nextcloud host.
   - Store raw transcript separately from any assistant interpretation.
   - Keep link back to original audio.

4. **YouTube activity reconciliation**
   - Gather candidate videos from Google My Activity / Takeout / manual export for the same day and nearby time window.
   - Match using: capture time, transcript mentions of title/channel/topic, watch order, and later user confirmation if confidence is low.

5. **Video metadata enrichment**
   - Once a likely video URL/ID is known, fetch public metadata using YouTube Data API or other safe metadata source.
   - Store title, channel, URL, duration, description, chapters if available, and thumbnail URL.

6. **Transcript/content enrichment**
   - If available and legally/technically appropriate, retrieve YouTube captions/transcript.
   - Keep video transcript/caption text distinct from user voice transcript and assistant notes.

7. **Note generation / staging**
   - Generate Markdown into a staging folder first, not directly into the live vault.
   - Create/update daily watched-video note.
   - Create per-video note for flagged/interesting videos.
   - Add concepts/tags/backlinks only according to agreed vault protocols.

8. **Review / approval**
   - For early versions, require user review before moving notes from staging into the Obsidian vault.
   - Low-confidence video matches should be presented as candidates rather than written as fact.

9. **Obsidian import**
   - After backup/sync protocol is established, import approved Markdown into agreed vault folders.
   - Maintain an index of processed captures and source files to avoid accidental duplicate processing, while preserving legitimate repeated watches as separate watch events.

10. **Audit and rollback**
   - Keep raw audio, raw transcript, generated notes, and processing logs.
   - Rollback means deleting generated/imported Markdown while preserving raw captures and transcripts.
   - Repeated watches of the same video are valid data and should not be collapsed. The system should deduplicate processing artifacts, not human experience.

## Obsidian Authorship / Provenance Protocol

Before automation writes to the vault, use clear boundaries between the user's words, source material, and assistant-generated text. It is acceptable to have vault areas that are explicitly all AI-generated; the highest-risk case is ambiguous mixed notes where user thoughts and assistant thoughts could be confused.

Recommended rules:

- **Never silently rewrite the user's text.** Preserve user-authored notes unless explicitly asked to edit.
- **Repeated watches are first-class events.** The same video may be watched multiple times with different reactions. Preserve each watch/capture event separately rather than treating it as a duplicate of the video.
- **Use explicit sections** in generated notes:
  - `## User capture` — verbatim transcript or user-authored text.
  - `## Source material` — video metadata, captions, quotes, links.
  - `## Assistant summary` — assistant-generated summary.
  - `## Assistant analysis` — assistant-generated interpretation/concepts.
  - `## Watch events` — repeated watches/captures over time, each with date, mood/reaction if known, and links to voice captures.
  - `## Follow-up questions` — assistant-generated prompts.
- **Use YAML frontmatter provenance fields**, for example:

```yaml
created_by: assistant
review_status: unreviewed
source_type: youtube_voice_capture
user_audio: path-or-link-to-audio
user_transcript: path-or-link-to-transcript
assistant_generated: true
contains_user_words: true
contains_assistant_words: true
last_reviewed_by_user: null
```

- **Mark uncertain claims** with `TODO: verify` or `confidence: low` rather than presenting them as facts.
- **Prefer additive changes**: create new notes or append under assistant-marked sections instead of modifying existing vault notes.
- **Require approval before editing existing notes**, especially concept notes written by the user.
- **Use generated-note folders initially**, for example `Inbox/AI Generated/YouTube/`, until trust and review habits are established. In this folder, notes may be assumed AI-generated unless specific `## User capture` sections are marked as verbatim user transcript.
- **For ambiguous/mixed notes, label every section by author/source** rather than relying only on folder location.
- **Keep source links** from every generated note back to the original audio, transcript, and video.
- **Separate video identity from watch events**: one video can have many watch events, and one watch event can have zero or more voice captures.
- **Use git/backup/versioning** for the vault before enabling automated writes.


### Duplicate / Repeat-Watch Policy

Do not treat repeated viewing as an error. The project should distinguish:

- **Video identity**: the YouTube video itself, keyed by video ID/URL.
- **Watch event**: a specific time the user watched the video.
- **Capture event**: a specific voice memo or note the user made near a watch event.
- **Generated artifact**: transcript, summary, daily note entry, or video note generated by the system.

Deduplication should only prevent accidental repeated processing of the same source file/export row. It should not erase repeated watch events. A user may watch the same video twice and have different reactions, including liking it once and disliking it later. Those reactions are meaningful and should be preserved as separate dated events.

Recommended model:

```yaml
video_id: abc123
watch_events:
  - watched_at: 2026-05-14T10:15:00
    source: google_my_activity
    captures:
      - audio: captures/2026-05-14-1017.m4a
        user_reaction: interested
        transcript: "This one seems worth revisiting..."
  - watched_at: 2026-06-02T15:40:00
    source: google_my_activity
    captures:
      - audio: captures/2026-06-02-1543.m4a
        user_reaction: skeptical
        transcript: "Actually I disagree with this now..."
```

## Infrastructure / Dependencies

Potential dependencies:
- Nextcloud server and chosen app/API.
- Mobile Nextcloud app behavior for offline use.
- Obsidian vault location/access method.
- Google Data Portability API investigation for automated YouTube My Activity/watch-history import.
- Optional YouTube Data API key/OAuth for metadata enrichment after video IDs are known.
- Manual Google Takeout export only as fallback/debug path, not target operation.
- Browser history export or manual share links only as fallback sources.
- Optional transcript tooling for public videos where transcripts are available.

## Security / Safety

- Do not request or store YouTube/Google credentials unnecessarily.
- Start with manual Google Takeout/export testing and API documentation review before creating OAuth apps or storing tokens.
- Prefer export/API methods over scraping private account data.
- Treat watch history and notes as private personal data.
- Do not access the Obsidian vault until permission, path, and backup/sync model are documented.
- Avoid employer-device automation unless explicitly authorized by the user and allowed by policy.

## Backup / Rollback Plan

- Keep raw captures unchanged in the capture inbox.
- Write generated Obsidian notes first to a staging folder.
- If importing into the vault, ensure the vault has a backup or versioning first.
- Rollback means deleting staged/imported generated notes while keeping raw captures.

## Open Questions

- Can YouTube on Android expose enough useful context while offline, such as title, channel, queue/history, or share URL?
- Done initial test: YouTube Premium offline viewing appears in normal YouTube history after sync, but the website view does not show watch timestamps.
- Done initial check: Google My Activity provides a single timestamp for watched videos.
- Confirm whether the My Activity timestamp is actual offline watch start time or later resync time.
- Does manual Google Takeout include the same timestamps?
- Confirm Google Data Portability support for YouTube My Activity/watch-history, including exact resources/scopes, OAuth behavior, background refresh feasibility, and whether it includes timestamps/video IDs/URLs.
- Does the YouTube Data API expose any usable watch-history data, or only public metadata/playlists/subscriptions?
- Does Nextcloud Talk on Android allow composing/sending messages while offline and queue them reliably?
- Which Android voice recorder app is fastest and saves files to an accessible local folder with reliable timestamps?
- Can Nextcloud Android auto-upload monitor that recordings folder, or is FolderSync/WebDAV/Syncthing needed?
- What exact folder/template should be used once the Obsidian vault is synced via Nextcloud?
- Are transcripts/summaries desired for all watched videos, or only videos the user explicitly flags?
- How much should the system log unmentioned watched videos in Obsidian: every video, daily summaries only, or only selected/interesting items?

## Acceptance Criteria

This spec is satisfied when:
- [x] Verify whether YouTube Premium offline watches appear in normal YouTube history after sync.
- [x] Check whether Google My Activity includes timestamps for offline-synced watches.
- [ ] Confirm whether My Activity timestamps are actual watch-start times or sync times.
- [ ] Check whether Google Takeout includes equivalent timestamps.
- [x] Nextcloud Talk offline/queued capture behavior is tested on Android and found unsuitable: messages/self-notes cannot be created without Nextcloud connection.
- [ ] Test local Android voice recorder plus automatic upload/sync after reconnect.
- [ ] A capture method is chosen and tested offline/online.
- [ ] A standard minimal capture message/template exists.
- [ ] A staging Markdown format for video notes and daily capture notes exists.
- [ ] At least three test video captures are processed into Markdown notes.
- [ ] Obsidian import/access is documented and backed up before automation writes into the vault.

## Decisions

- 2026-05-14: Start as a new project with a draft spec; do not assume programmatic access to YouTube offline watch data.
- 2026-05-14: Primary viewing device is an Android tablet.
- 2026-05-14: Prefer voice capture that may mention title, author/channel, or subject to assist matching. Nextcloud Talk was preferred initially, but offline testing showed it cannot create/send messages or self-notes without connection, so use local voice recording plus later upload/sync instead.
- 2026-05-14: Obsidian vault is currently on the user's PC and should be synced with Nextcloud soon.
- 2026-05-14: Preferred output is one note per interesting video plus a daily capture note, with concept integration into the existing vault later. Also consider a watched-video index/log in Obsidian, including unmentioned videos where useful.
- 2026-05-15: Keep most processing/logging/raw artifacts outside the Obsidian vault. Obsidian should receive curated daily YouTube notes and selected per-video/concept notes.
- 2026-05-15: Daily YouTube notes should include summaries of each relevant non-Short video, author/channel, thumbnail reference, and caption-derived information where available.
- 2026-05-15: YouTube Shorts should be ignored by default unless specifically discussed/flagged by the user.
- 2026-05-15: Investigate Google Data Portability as the preferred route for automated YouTube My Activity/watch-history import, since manual Takeout is not the desired operating model.
- 2026-05-14: Investigate YouTube API/Google Takeout feasibility, but do not assume an API key alone can access private watch history.
- 2026-05-14: Google Takeout should be treated as manual export fallback/debug path; no simple Takeout API key path is assumed.
- 2026-05-14: User tested Android tablet offline YouTube Premium viewing; videos appeared in YouTube history on PC after reconnect, but website history did not show watch timestamps.
- 2026-05-14: Google My Activity shows a single timestamp for watched videos; need to verify whether it represents actual watch start time or history resync time.
- 2026-05-14: Do not pursue a custom YouTube Premium offline player app; official offline downloads are not practically/legally accessible for this workflow due to YouTube app/DRM/terms constraints.
