TigerScribeSign in

YouTube to notes

How to convert YouTube videos into structured notes without watching them twice

Long-form video is the worst format for retention. A good YouTube-to-notes pipeline turns 90 minutes of watching into 5 minutes of skimmable, searchable notes — without losing the gold.

April 1, 20268 min read6 sections

Why you keep re-watching the same video

Long-form YouTube video is one of the worst formats ever invented for information retention. Lectures average 8% recall after a week. Conversational podcasts average less. The reason you keep going back to the same Andrej Karpathy video, the same Lex Fridman interview, is not that the material was unmemorable — it is that the format makes retention almost impossible without external scaffolding.

The fix is to treat the video as a source and turn it into something you can actually skim. That something is a structured notes document — outline, key claims, named speakers, timestamps for the moments you will want to revisit. The tooling to produce this is now cheap and fast. The bottleneck has shifted from production to whether your tool produces notes you trust.

8%

Lecture recall

After 7 days, no notes

5x

Skim speed

Notes vs. video

90 min

Average video length

Long-form content

The three layers of a good notes workflow

Flat transcript

  • 90 minutes as a wall of words
  • No structure, no outline
  • Speakers labeled "Speaker 1, 2"
  • Easier to search, hard to study

Structured notes

  • Outline with skimmable headings
  • Bullet-pointed key claims
  • Named speakers + timestamps
  • Linked back to source for verification
A flat transcript is not notes — it is a search target

Notes that work for retention have three layers underneath the surface: a clean transcript, an outline that compresses each ten-minute chunk into a headline, and a key-claims pass that surfaces the 5-15 statements you will actually want to remember. The first two are now generated automatically by most tools. The third is where quality varies wildly.

Step-by-step: from URL to outline

  1. 01Paste the YouTube URL into your tool, or download the video and audio if your tool ingests files.
  2. 02Run transcription with diarization on. Confirm speaker labels in the first 90 seconds — that is when speakers tend to introduce themselves.
  3. 03Generate an outline pass: 5-9 chapter headings with timestamps. Edit headings ruthlessly — the goal is skimmable, not exhaustive.
  4. 04Run a key-claims pass: 8-15 bullet points across the whole video. Each one should be a statement, not a topic.
  5. 05Tag any "I need to come back to this" moments with timestamps. These are the 5-second loops you will revisit weeks later.
  6. 06Export to your notes app of choice with a link back to the YouTube URL and timestamps preserved.

Speaker labels for multi-host videos

For interview podcasts, panel discussions, and shows with recurring guests, speaker labels are not optional. Half the value of an interview transcript is "what did the guest say versus what did the host say" — collapse that distinction and you have a wall of opinions with no source attribution.

Tools that maintain persistent voice IDs across videos are dramatically more useful here. Once you have transcribed two episodes of "Acquired" or three Lex Fridman interviews, the recurring host is already named. New guests get tagged on first appearance and stay named for every future episode they appear in. That compounds — your notes archive gets more useful the more videos you process.

Exporting to Notion, Obsidian, and Apple Notes

Notes appBest formatSpeaker tags?Timestamps as links?
NotionMarkdown or direct APIYes (callouts)Yes (web links)
ObsidianMarkdown with YAMLYes (formatted)Yes
Apple NotesRich textLimitedLimited
Roam / LogseqMarkdown blocksYesYes
Google Docs.docxYesYes (hyperlinks)
Export formats by destination

One export practice that pays off forever: include the original YouTube URL with the timestamp in every key claim. When future-you wants to verify a quote or re-listen to a 30-second segment, you click the link and you are at the exact moment. This single pattern is the difference between notes that age well and notes that become unverifiable folklore.

Studying lectures vs. consuming explainers

Two different goals call for two different tunings of the same workflow. Studying a lecture means you are going to come back, you want full claim-level extraction, and you want OCR on slides. Consuming an explainer means you want the takeaways, you are not going back, and a tight 8-bullet summary is enough. Same tool, different settings.

  • Study mode: full transcript + outline + claims + slide OCR + linked timestamps.
  • Consume mode: outline + 8-15 bullet takeaways + a single "save for later" timestamp.
  • Reference mode (for the videos you cite in writing): full transcript with speaker tags, ready to drop a quote into a draft.

Keep reading