Every day, millions of people paste emails, financial records, medical summaries, and proprietary source code into AI chatbot windows — without pausing to ask where that data goes. The convenience of AI assistants has outpaced most users' awareness of the privacy trade-offs involved. This guide sets the record straight on the actual risks in 2026.

The Numbers Are Worse Than You Think

Awareness of AI privacy risks has grown, but behaviour hasn't kept pace. Independent analysis of enterprise AI usage data published in late 2024 found that 27.4% of data employees paste into AI tools is sensitive — including regulated PII, internal strategy documents, authentication credentials, and proprietary source code. Nearly three in ten prompts contain information that shouldn't leave the organisation, let alone travel to a third-party server.

27.4%
of corporate data sent to AI tools is sensitive (2024 enterprise analysis)
29M+
secrets leaked on GitHub in 2025, many via AI-assisted commits (GitGuardian)
3 in 5
knowledge workers report pasting confidential information into AI tools at least monthly

The credential leak problem has a specific AI dimension. GitGuardian's 2025 State of Secrets Sprawl report documented over 29 million secrets leaked on GitHub — API keys, OAuth tokens, database passwords — and flagged that AI-assisted commits leak secrets at approximately twice the rate of non-AI commits. Developers are using AI to write boilerplate faster, and inadvertently shipping credentials that were included in the prompts.

Corporate Bans: What Large Organisations Already Know

The risks are not theoretical. Some of the world's most security-conscious organisations have acted decisively.

Samsung's Source Code Leak

In early 2023, Samsung engineers pasted proprietary semiconductor source code and internal meeting notes into ChatGPT to ask for debugging help and meeting summaries. The data was transmitted to OpenAI's servers. Once uploaded, Samsung had no mechanism to retrieve or delete it. Samsung subsequently banned all use of generative AI tools for corporate work. The incident became the defining early example of AI data leakage in an enterprise context.

Apple's Restricted Use Policy

Apple restricted employee use of ChatGPT and GitHub Copilot over concerns that proprietary design and engineering information could be inadvertently shared with AI providers. Apple is developing its own internal AI tools partly to sidestep this risk.

JPMorgan's Compliance Position

JPMorgan Chase restricted ChatGPT access for employees in early 2023, citing regulatory compliance concerns. In the financial sector, sharing client data with third-party systems without explicit data processing agreements violates GDPR, CCPA, and financial services regulations. The bank's position reflects a broader truth: sending client PII to an AI provider can be a regulatory violation, not just a privacy risk.

The Urban VPN Cautionary Tale

December 2025: Urban VPN — a service explicitly marketed as a privacy tool — was exposed for harvesting users' browsing history and selling it to third-party data brokers. The lesson: a "privacy" label is not a guarantee. Tools that process your data in the cloud are only as trustworthy as their internal policies, which you cannot independently verify.

The Urban VPN scandal crystallised a principle that applies equally to AI tools: cloud-dependent privacy promises are not auditable by users. When a vendor says "we don't sell your data," you're taking their word for it. When your data never leaves your device — as with local-only tools — no such trust is required.

AI chatbot providers are not villains, but their business models create structural incentives that don't always align with user privacy. Training better models requires more data. Retaining conversation history improves the user experience but creates a large stored dataset. Even providers with strong stated policies operate under jurisdictions where government data requests may compel disclosure.

What Types of Data Are Most at Risk

Personal Identifiable Information (PII)

Names, addresses, Social Security numbers, dates of birth, and passport numbers are common in AI prompts — often pasted as part of a document the user wants summarised or a form they want help completing. Once submitted, this data may be logged, stored, and potentially used for model training.

Authentication Credentials and API Keys

Developers routinely ask AI assistants to debug code, and it's easy to forget to remove API keys, database connection strings, or OAuth tokens before pasting a code snippet. A single leaked key can provide attackers with access to cloud infrastructure, payment systems, or user databases. AWS, GitHub, Stripe, and OpenAI API keys are among the most commonly leaked credential types.

Medical and Financial Records

People ask AI tools to summarise medical reports, explain insurance claims, and help draft appeals to insurers. These documents often contain diagnoses, prescription histories, and policy numbers — data that is specifically regulated under HIPAA (US), GDPR (EU), and equivalent frameworks elsewhere. Sharing this data with an AI provider without a signed Business Associate Agreement is a HIPAA violation.

Proprietary Business Information

Strategy decks, client lists, M&A discussions, and internal financial projections are frequently pasted into AI tools for summarisation, formatting, or analysis. This creates significant legal exposure — both in terms of trade secret law and in terms of confidentiality obligations to clients and investors.

The Provider Landscape in 2026

All major AI providers have made commitments about data handling, but the details matter.

  • ChatGPT (OpenAI): Free and Plus accounts: conversations may be used for model training unless opted out. Enterprise and Team plans: data is not used for training by default.
  • Claude (Anthropic): Free tier conversations may be reviewed by Anthropic. API customers and Claude for Work users have stronger isolation guarantees.
  • Gemini (Google): Consumer accounts: conversations processed by Google and potentially used to improve services. Workspace accounts have different data handling under their enterprise agreement.
  • DeepSeek: Based in China; conversations are processed on servers subject to Chinese data laws, which include broad government access provisions.

Key point: "Not using data for training" does not mean "not storing data." Even providers that opt you out of training still typically log conversations for safety monitoring, abuse prevention, and debugging. Storage creates risk even when training opt-outs are honoured.

Practical Implications

The safest rule is simple: treat every AI chatbot conversation as if it will be stored indefinitely and potentially read by the provider's staff. That framing leads to clear behaviour:

  • Never paste documents containing real SSNs, credit card numbers, or passport details.
  • Sanitise code before asking for debugging help — remove all real API keys and connection strings.
  • Avoid naming real clients or individuals in prompts.
  • Use enterprise-grade accounts with explicit data handling agreements for any regulated data.
  • Consider a local-first tool that intercepts and anonymises sensitive data before it reaches the provider.

Protect your prompts automatically

PromptGnome detects PII, API keys, and credentials in your messages before they leave your browser — entirely locally, with zero data collection.

Get PromptGnome Free

Frequently Asked Questions

Is ChatGPT safe for personal data?

ChatGPT stores your conversations by default and may use them to improve its models unless you opt out. Sensitive personal data — Social Security numbers, financial records, medical details — should never be included in prompts without anonymisation.

What percentage of corporate data sent to AI tools is sensitive?

According to a 2024 analysis, 27.4% of data employees paste into AI tools is sensitive — including source code, regulated PII, financial records, and confidential strategy documents.

Why did Samsung ban ChatGPT?

Samsung banned ChatGPT after employees accidentally leaked proprietary semiconductor source code and internal meeting notes by pasting them into ChatGPT prompts. The data was transmitted to OpenAI's servers and could not be recalled.

What happened in the Urban VPN scandal?

In December 2025, Urban VPN was found to be harvesting users' browsing data and selling it to third-party data brokers — despite being marketed as a privacy tool. The case underscores why local-only, verifiable tools are safer than cloud-dependent alternatives.

Can AI providers use my prompts to train their models?

It depends on the provider and your account tier. Free consumer accounts at OpenAI, Anthropic, and Google are the most exposed. Enterprise plans typically exclude data from training, but still involve server-side storage and processing.