Privacy deep dive
Transcription privacy deep dive — HIPAA, BAA, data residency, and the questions vendors avoid
HIPAA-compliant transcription, BAA available, data residency, no training on audio, GDPR transcription, retention windows — privacy questions that vendors avoid.
The privacy question set most vendors avoid
For sensitive recordings — therapy sessions, legal depositions, internal HR conversations, medical consultations, attorney-client communications — "free" and "fast" stop being the priorities. The priorities become: where does the audio go, who has access, is it used to train models, when is it deleted, and what regulatory frameworks apply. These are the questions most consumer transcription marketing pages avoid; you have to read the privacy policy and the security/trust page to find out.
This deep-dive walks through each privacy question, what the answer should look like, and what red flags should disqualify a vendor for sensitive use. The goal is not to be paranoid — it is to make procurement decisions on the same axes a security or legal review would.
Where does the audio actually live?
The first question: which cloud provider, which region, and which servers hold the audio file at rest. The acceptable answers vary by jurisdiction:
- EU residents: data must remain in EU per GDPR for some categories. AWS Frankfurt, GCP eu-west, Azure West Europe acceptable.
- Healthcare (US): data should be in a HIPAA-eligible region with a signed BAA. AWS us-east-1, GCP us-central, Azure West US 2 typical.
- Government / regulated: may require FedRAMP-certified or sovereign-cloud regions.
- Consumer / non-regulated: any region is fine, but disclosure should be specific.
A vendor that does not state where its data lives is a red flag — even for non-regulated use, geographic disclosure is a basic transparency expectation.
Who has access — humans, models, or both?
Audio uploaded to a transcription service can be accessed by: the engineering team for debugging, the support team for ticket resolution, the model team for training improvements, contracted human transcribers (for human-quality services like Rev), and the underlying ASR provider (if the SaaS wraps another API). A complete answer names each path or commits to none.
Acceptable
- "Engineers may access for debugging only with user consent"
- "Audio is not used to train models" (clear)
- "Human transcribers see audio for human-tier orders only"
- "Subprocessor list available on request"
Red flags
- Silence on training
- "We may use data to improve our service" (ambiguous)
- No subprocessor disclosure
- No way to opt out of training
Training on user audio — the explicit yes/no
Modern transcription quality depends on model improvements, and model improvements depend on training data. Some vendors use customer audio (with or without consent) to improve their models. This is the single most important question for sensitive use:
- Otter: opt-in for paid plans; opt-out available; verify in account settings.
- TigerScribe: explicit "we do not train on your audio" commitment.
- OpenAI Whisper API: per OpenAI policy, API data not used for training by default.
- Google Cloud Speech-to-Text: per GCP terms, customer data not used for training.
- Free / freemium consumer tools: varies; read the privacy policy carefully.
Retention — how long does the audio live?
Even if the vendor does not train on data, retention matters for breach exposure. A 30-day auto-delete window is dramatically better than indefinite retention; a 7-day window is better still. Acceptable retention disclosures:
- "Audio auto-deletes after N days" — explicit number.
- "Audio retained for the duration of the user account" — acceptable if delete-on-account-close is honoured.
- "Audio retained per user-configured retention setting" — best.
- "Audio retained until manually deleted by user" — acceptable but weaker.
- Silence on retention — disqualifying for sensitive use.
HIPAA and BAAs — the healthcare path
For healthcare use in the US — therapy session notes, medical consultations, anything covered by HIPAA — the vendor must sign a Business Associate Agreement (BAA). Without a signed BAA, uploading PHI is a HIPAA violation regardless of how good the encryption is.
- Vendor offers BAA on request (or by default for enterprise plans).
- BAA covers encryption at rest and in transit, breach notification, data destruction.
- Vendor passes HIPAA security audit (SOC 2 Type II is adjacent but not equivalent).
- Subprocessors (cloud providers, ASR APIs) are also BAA-covered.
For "HIPAA-compliant transcription," the major vendors that offer BAAs include: Otter (Business plan), Sonix (Enterprise), Rev (Healthcare plan), AWS Transcribe (with BAA), Google Cloud STT (with BAA). Free / consumer tools typically do not offer BAAs — do not upload PHI to consumer free tiers.
GDPR and the EU data path
For EU-resident users or recordings of EU data subjects, GDPR applies. Key requirements: (1) lawful basis for processing (usually user consent for transcription), (2) data residency within EU or under an approved transfer mechanism, (3) right to deletion (user can demand removal), (4) breach notification within 72 hours, (5) DPA (Data Processing Agreement) with the vendor.
Keep reading
Speaker Identification
The Speaker 1 problem: why every transcription tool fumbles who said what
9 min →
Audio to Text
Audio to text in 2026: a guide that actually accounts for accuracy, speakers, and privacy
10 min →
Video to Text
Video to text: how to convert video to clean, usable transcripts without losing context
9 min →