YouTube to notes

How to convert YouTube videos into structured notes without watching them twice

Long-form video is the worst format for retention. A good YouTube-to-notes pipeline turns 90 minutes of watching into 5 minutes of skimmable, searchable notes — without losing the gold.

April 1, 20268 min read6 sections

Why you keep re-watching the same video

Long-form YouTube video is one of the worst formats ever invented for information retention. Lectures average 8% recall after a week. Conversational podcasts average less. The reason you keep going back to the same Andrej Karpathy video, the same Lex Fridman interview, is not that the material was unmemorable — it is that the format makes retention almost impossible without external scaffolding.

The fix is to treat the video as a source and turn it into something you can actually skim. That something is a structured notes document — outline, key claims, named speakers, timestamps for the moments you will want to revisit. The tooling to produce this is now cheap and fast. The bottleneck has shifted from production to whether your tool produces notes you trust.

Lecture recall

After 7 days, no notes

Skim speed

Notes vs. video

90 min

Average video length

Long-form content

The three layers of a good notes workflow

Flat transcript

90 minutes as a wall of words
No structure, no outline
Speakers labeled "Speaker 1, 2"
Easier to search, hard to study

Structured notes

Outline with skimmable headings
Bullet-pointed key claims
Named speakers + timestamps
Linked back to source for verification

A flat transcript is not notes — it is a search target

Notes that work for retention have three layers underneath the surface: a clean transcript, an outline that compresses each ten-minute chunk into a headline, and a key-claims pass that surfaces the 5-15 statements you will actually want to remember. The first two are now generated automatically by most tools. The third is where quality varies wildly.

Step-by-step: from URL to outline

01Paste the YouTube URL into your tool, or download the video and audio if your tool ingests files.
02Run transcription with diarization on. Confirm speaker labels in the first 90 seconds — that is when speakers tend to introduce themselves.
03Generate an outline pass: 5-9 chapter headings with timestamps. Edit headings ruthlessly — the goal is skimmable, not exhaustive.
04Run a key-claims pass: 8-15 bullet points across the whole video. Each one should be a statement, not a topic.
05Tag any "I need to come back to this" moments with timestamps. These are the 5-second loops you will revisit weeks later.
06Export to your notes app of choice with a link back to the YouTube URL and timestamps preserved.

Speaker labels for multi-host videos

For interview podcasts, panel discussions, and shows with recurring guests, speaker labels are not optional. Half the value of an interview transcript is "what did the guest say versus what did the host say" — collapse that distinction and you have a wall of opinions with no source attribution.

Tools that maintain persistent voice IDs across videos are dramatically more useful here. Once you have transcribed two episodes of "Acquired" or three Lex Fridman interviews, the recurring host is already named. New guests get tagged on first appearance and stay named for every future episode they appear in. That compounds — your notes archive gets more useful the more videos you process.

Exporting to Notion, Obsidian, and Apple Notes

Notes app	Best format	Speaker tags?	Timestamps as links?
Notion	Markdown or direct API	Yes (callouts)	Yes (web links)
Obsidian	Markdown with YAML	Yes (formatted)	Yes
Apple Notes	Rich text	Limited	Limited
Roam / Logseq	Markdown blocks	Yes	Yes
Google Docs	.docx	Yes	Yes (hyperlinks)

Export formats by destination

One export practice that pays off forever: include the original YouTube URL with the timestamp in every key claim. When future-you wants to verify a quote or re-listen to a 30-second segment, you click the link and you are at the exact moment. This single pattern is the difference between notes that age well and notes that become unverifiable folklore.

Studying lectures vs. consuming explainers

Two different goals call for two different tunings of the same workflow. Studying a lecture means you are going to come back, you want full claim-level extraction, and you want OCR on slides. Consuming an explainer means you want the takeaways, you are not going back, and a tight 8-bullet summary is enough. Same tool, different settings.

Study mode: full transcript + outline + claims + slide OCR + linked timestamps.
Consume mode: outline + 8-15 bullet takeaways + a single "save for later" timestamp.
Reference mode (for the videos you cite in writing): full transcript with speaker tags, ready to drop a quote into a draft.

Keep reading

How to convert YouTube videos into structured notes without watching them twice

Why you keep re-watching the same video

The three layers of a good notes workflow

Step-by-step: from URL to outline

Speaker labels for multi-host videos

Exporting to Notion, Obsidian, and Apple Notes

Studying lectures vs. consuming explainers

The Speaker 1 problem: why every transcription tool fumbles who said what

Audio to text in 2026: a guide that actually accounts for accuracy, speakers, and privacy

Video to text: how to convert video to clean, usable transcripts without losing context