Privacy & compliance

Is your transcription tool training on your audio? A 2026 privacy audit

BIPA lawsuits, training-data class actions, and quietly retained voiceprints have made transcription privacy a serious procurement question. Here is the audit checklist.

March 18, 202611 min read6 sections

Why this matters now

Two ongoing class-actions reframed transcription privacy in 2024-2025. The first alleged that Otter used customer meeting transcripts to train AI models without explicit, informed consent. The second, against Fireflies, alleged voiceprint collection without consent in violation of Illinois's Biometric Information Privacy Act. Both are still working through courts. Both have already changed how serious procurement reviews approach this category.

The shift is mostly about default postures. Five years ago, "we use your data to improve the product" was uncontroversial language. In 2026, it is read as a red flag, and the burden of proof has shifted to vendors to demonstrate they do not. If your tool's privacy policy is vague, your procurement team will assume the worst and recommend a different tool.

Major class actions

Otter + Fireflies, 2024-2025

States with biometric laws

IL, TX, WA + 9 more

~$5K

BIPA per-violation max

Per intentional violation

The five questions a clean privacy posture answers

01Is customer audio used to train any model — vendor-internal or partner-shared?
02Are voiceprints collected, and if so, with what consent and retention?
03What is the default audio retention period, and is it documented?
04Is a BAA available for HIPAA-covered customers, with named subprocessors?
05What is the deletion guarantee, and how is deletion verified?

A clean posture answers all five in plain language, in a single document, without pointing you to a maze of references. A weak posture buries the answers in nested links, replaces "no" with "we may," and treats deletion as best-effort rather than guaranteed. The difference is usually visible within ten minutes of reading the privacy page.

Default retention — the silent leak

Retention is the privacy decision most users never see. Tools that retain audio "until the user deletes it" are setting up an inevitable accumulation: by year three, you have hundreds of recordings sitting on a vendor's storage, most of which you forgot existed, all of which are part of any breach blast radius. The default needs to be short and explicit.

Indefinite by default

Most consumer transcription products
Audio kept until manual deletion
Voiceprints often not user-deletable
Cumulative breach exposure grows

Short and explicit

Privacy-first tools, BAA-focused vendors
7-30 days, then auto-deleted
Voiceprints user-deletable, retention published
Bounded, predictable exposure

Retention defaults across major tools, late 2025

Voiceprints and biometric law

A voiceprint is biometric data. In Illinois, Texas, Washington, and a growing number of other states, collecting biometric data without informed consent is a statutory violation with damages calculated per occurrence. The Fireflies case is the prominent example, but it is not the only one in flight.

If your tool collects voiceprints, the right consent flow is: explicit opt-in by each enrolled speaker, an explanation of what the voiceprint is used for, a clear retention period, and a one-click delete that actually deletes (and is auditable). Tools that collect silently, or that bury consent in a generic terms-of-service click, are accumulating legal risk that will eventually surface.

“Voiceprints should be opt-in, scoped, and forgettable. If your tool cannot deliver on all three, your tool is not safe for regulated work.”

HIPAA, GDPR, and where most tools fall short

HIPAA-covered work — therapy, clinical research, healthcare operations — requires a Business Associate Agreement (BAA) with every vendor that touches PHI. GDPR adds data-residency, right-to-deletion, and lawful-basis requirements that most US-only consumer tools do not meet. The intersection of the two is a narrow set of tools, and the marketing for the wider set tends to gloss over which protections actually apply.

Vendor type	BAA available?	GDPR DPA?	EU data residency?
Consumer transcription apps	Rare	Sometimes	No
Enterprise transcription suites	Yes (Team/Enterprise)	Yes	Sometimes
Privacy-first vendors	Yes (often Pro+)	Yes	Yes
Local-only / on-device tools	N/A — never leaves device	N/A	Native
Generalist API platforms	Enterprise contract	Yes	Configurable

Compliance posture by vendor type

The clauses to read in your tool's terms today

Section on "use of customer data" — look for explicit "we do not train models" language.
Subprocessor list — every party with access to your audio should be named.
Retention schedule — should be a number of days, not "as long as necessary."
Deletion guarantees — verify they include backups, not just primary storage.
Voiceprint policy — opt-in, named retention, one-click delete.
Incident notification — within how many hours of discovery?
Jurisdiction and data residency — where the audio actually sits.

A good rule: if any of those clauses are missing, vague, or buried, downgrade your trust accordingly. Privacy posture is one of those things vendors only get serious about when customers ask. Be the customer who asks.

Keep reading

Is your transcription tool training on your audio? A 2026 privacy audit

Why this matters now

The five questions a clean privacy posture answers

Default retention — the silent leak

Voiceprints and biometric law

The clauses to read in your tool's terms today

The Speaker 1 problem: why every transcription tool fumbles who said what

Audio to text in 2026: a guide that actually accounts for accuracy, speakers, and privacy

Video to text: how to convert video to clean, usable transcripts without losing context