Privacy & compliance
Is your transcription tool training on your audio? A 2026 privacy audit
BIPA lawsuits, training-data class actions, and quietly retained voiceprints have made transcription privacy a serious procurement question. Here is the audit checklist.
Why this matters now
Two ongoing class-actions reframed transcription privacy in 2024-2025. The first alleged that Otter used customer meeting transcripts to train AI models without explicit, informed consent. The second, against Fireflies, alleged voiceprint collection without consent in violation of Illinois's Biometric Information Privacy Act. Both are still working through courts. Both have already changed how serious procurement reviews approach this category.
The shift is mostly about default postures. Five years ago, "we use your data to improve the product" was uncontroversial language. In 2026, it is read as a red flag, and the burden of proof has shifted to vendors to demonstrate they do not. If your tool's privacy policy is vague, your procurement team will assume the worst and recommend a different tool.
2
Major class actions
Otter + Fireflies, 2024-2025
12
States with biometric laws
IL, TX, WA + 9 more
~$5K
BIPA per-violation max
Per intentional violation
The five questions a clean privacy posture answers
- 01Is customer audio used to train any model — vendor-internal or partner-shared?
- 02Are voiceprints collected, and if so, with what consent and retention?
- 03What is the default audio retention period, and is it documented?
- 04Is a BAA available for HIPAA-covered customers, with named subprocessors?
- 05What is the deletion guarantee, and how is deletion verified?
A clean posture answers all five in plain language, in a single document, without pointing you to a maze of references. A weak posture buries the answers in nested links, replaces "no" with "we may," and treats deletion as best-effort rather than guaranteed. The difference is usually visible within ten minutes of reading the privacy page.
Default retention — the silent leak
Retention is the privacy decision most users never see. Tools that retain audio "until the user deletes it" are setting up an inevitable accumulation: by year three, you have hundreds of recordings sitting on a vendor's storage, most of which you forgot existed, all of which are part of any breach blast radius. The default needs to be short and explicit.
Indefinite by default
- Most consumer transcription products
- Audio kept until manual deletion
- Voiceprints often not user-deletable
- Cumulative breach exposure grows
Short and explicit
- Privacy-first tools, BAA-focused vendors
- 7-30 days, then auto-deleted
- Voiceprints user-deletable, retention published
- Bounded, predictable exposure
Voiceprints and biometric law
A voiceprint is biometric data. In Illinois, Texas, Washington, and a growing number of other states, collecting biometric data without informed consent is a statutory violation with damages calculated per occurrence. The Fireflies case is the prominent example, but it is not the only one in flight.
If your tool collects voiceprints, the right consent flow is: explicit opt-in by each enrolled speaker, an explanation of what the voiceprint is used for, a clear retention period, and a one-click delete that actually deletes (and is auditable). Tools that collect silently, or that bury consent in a generic terms-of-service click, are accumulating legal risk that will eventually surface.
“Voiceprints should be opt-in, scoped, and forgettable. If your tool cannot deliver on all three, your tool is not safe for regulated work.”
HIPAA, GDPR, and where most tools fall short
HIPAA-covered work — therapy, clinical research, healthcare operations — requires a Business Associate Agreement (BAA) with every vendor that touches PHI. GDPR adds data-residency, right-to-deletion, and lawful-basis requirements that most US-only consumer tools do not meet. The intersection of the two is a narrow set of tools, and the marketing for the wider set tends to gloss over which protections actually apply.
| Vendor type | BAA available? | GDPR DPA? | EU data residency? |
|---|---|---|---|
| Consumer transcription apps | Rare | Sometimes | No |
| Enterprise transcription suites | Yes (Team/Enterprise) | Yes | Sometimes |
| Privacy-first vendors | Yes (often Pro+) | Yes | Yes |
| Local-only / on-device tools | N/A — never leaves device | N/A | Native |
| Generalist API platforms | Enterprise contract | Yes | Configurable |
The clauses to read in your tool's terms today
- Section on "use of customer data" — look for explicit "we do not train models" language.
- Subprocessor list — every party with access to your audio should be named.
- Retention schedule — should be a number of days, not "as long as necessary."
- Deletion guarantees — verify they include backups, not just primary storage.
- Voiceprint policy — opt-in, named retention, one-click delete.
- Incident notification — within how many hours of discovery?
- Jurisdiction and data residency — where the audio actually sits.
A good rule: if any of those clauses are missing, vague, or buried, downgrade your trust accordingly. Privacy posture is one of those things vendors only get serious about when customers ask. Be the customer who asks.
Keep reading
Speaker Identification
The Speaker 1 problem: why every transcription tool fumbles who said what
9 min →
Audio to Text
Audio to text in 2026: a guide that actually accounts for accuracy, speakers, and privacy
10 min →
Video to Text
Video to text: how to convert video to clean, usable transcripts without losing context
9 min →