# Google Data Portability research notes

## Metadata
- Type: Note
- Project: YouTube-to-Obsidian Knowledge Capture
- Created: 2026-05-15
- Updated: 2026-05-15
- Status: Draft

## Official docs checked

- Data Portability API home: <https://developers.google.com/data-portability>
- Introduction / developer workflow: <https://developers.google.com/data-portability/user-guide/introduction>
- Available OAuth scopes: <https://developers.google.com/data-portability/user-guide/scopes>
- REST reference: <https://developers.google.com/data-portability/reference/rest>
- Initiate archive: <https://developers.google.com/data-portability/reference/rest/v1/portabilityArchive/initiate>
- Check access type: <https://developers.google.com/data-portability/reference/rest/v1/accessType/check>
- Get archive state: <https://developers.google.com/data-portability/reference/rest/v1/archiveJobs/getPortabilityArchiveState>
- Authorization reset: <https://developers.google.com/data-portability/reference/rest/v1/authorization/reset>
- My Activity schema: <https://developers.google.com/data-portability/schema-reference/my_activity>

## Findings

Google Data Portability appears to be a plausible official route for automated YouTube activity import.

Relevant resource/scope:

```text
resource: myactivity.youtube
scope: https://www.googleapis.com/auth/dataportability.myactivity.youtube
classification: Restricted
```

The My Activity schema says activity records include timestamped user activity across Google products, including YouTube. Relevant fields include:

- `header` — typically app/product name, e.g. YouTube
- `title` — high-level summary, e.g. watched/searched activity
- `titleUrl` — URL associated with the activity, likely YouTube URL where available
- `subtitles` — details such as channel information
- `time` — time/date the user did the activity
- `products`
- `activityControls` — e.g. YouTube watch history/search history

This looks like it should provide the core watch-event data we need: watched time plus URL/title/channel-ish details.

## API flow

The documented flow is archive-based, not a simple list endpoint:

1. User authorizes OAuth scope `dataportability.myactivity.youtube`.
2. User chooses access duration:
   - one-time access, or
   - time-based access for 30 or 180 days.
3. App calls:

```http
POST https://dataportability.googleapis.com/v1/portabilityArchive:initiate
```

with body like:

```json
{
  "resources": ["myactivity.youtube"],
  "startTime": "2026-05-15T00:00:00Z",
  "endTime": "2026-05-16T00:00:00Z"
}
```

4. API returns an archive job ID and access type.
5. App polls:

```http
GET https://dataportability.googleapis.com/v1/archiveJobs/{job}/portabilityArchiveState
```

6. When complete, response includes signed Cloud Storage URLs for archive download.
7. App downloads archive, parses My Activity JSON/HTML, normalizes watched-video records.

## Background feasibility

The docs say time-based access allows exports every 24 hours until consent expires. The consent duration can be 30 or 180 days, and the user can renew near expiry.

This is not indefinite background access, but it may be good enough:

- run daily archive jobs for the previous day
- renew authorization periodically
- avoid manual Takeout downloads

Important caveat: the `myactivity.youtube` scope is restricted. Production use requires Google app verification and possibly a security assessment. For personal/test use, we need to determine what Google Cloud OAuth testing mode permits.

## Data handling design

Imported archives should be stored outside Obsidian:

```text
/var/lib/youtube-activity-import/
  archives/
  extracted/
  normalized/
  state/
  logs/
```

Normalized event shape target:

```json
{
  "source": "google_data_portability",
  "resource": "myactivity.youtube",
  "watched_at": "2026-05-15T04:18:00Z",
  "title": "Watched Example Video",
  "url": "https://www.youtube.com/watch?v=...",
  "video_id": "...",
  "channel": "...",
  "activity_controls": ["YouTube watch history"],
  "is_short": false,
  "raw_record_path": "/var/lib/youtube-activity-import/extracted/..."
}
```

## Probe result

Initial OAuth link worked, but Google reported that the Data Portability feature is not available in the user's country/region. This blocks Google Data Portability as the primary automated source unless availability changes, the user's account/region changes, or another permitted Google access route exists.

## Open questions

- Is there any legitimate account/region configuration issue, or is Data Portability simply unavailable for the user?
- Can we use OAuth testing mode for a personal app with the restricted `myactivity.youtube` scope?
- Does the downloaded archive provide JSON by default, or do we need to choose/request format somehow?
- Does `titleUrl` reliably include the YouTube video URL for watched videos?
- Does `subtitles` reliably include channel/author?
- Does offline Premium viewing appear with actual watch time or sync time?
- Are Shorts distinguishable from URL/title/duration alone, or do we need YouTube metadata enrichment?
- Does the API issue refresh tokens in the usual way, or does time-based access use access tokens with expiry/renewal semantics specific to Data Portability?

## Initial decision

Do not proceed with Data Portability as the primary implementation while it is unavailable in the user's country/region. Keep the probe code for future use. Investigate fallback sources for automated watched-video data.
