Prompt Data Is the New Shadow Data Layer

Written by Alex Vakulov | Thu | Jul 2, 2026 | 1:43 PM Z

The DLP alert your proxy catches is usually a clear outbound event: a file uploaded to an unsanctioned app or a spreadsheet emailed outside the company. What it may miss is the paragraph of legal language an associate pasted into an AI tool to clean up the wording. That is a data transfer too. It just does not look like the kind of transfer most controls were built to catch.

Prompt data has become a shadow data channel operating within sanctioned workflows, on corporate devices, and often via approved network paths, which is exactly why traditional DLP and CASB rules may miss it.

A 2025 LayerX Security report found that approximately 18% of users paste data into GenAI tools, and about half of that pasted content is company information. For many security teams, most of this activity remains outside practical prompt level visibility. In practice, it only takes a few careless or untrained users to create a serious data exposure problem for the entire company.

Prompt data should therefore be treated as a governed data channel, rather than left as a blind spot within otherwise approved workflows.

Start with a tier map

Before classification frameworks or DLP rules, the security team needs an accurate picture of which AI tools are actually in use and which data-handling regime each employee operates under. That map has three distinct tiers, and conflating them produces policies that either miss real risk or block legitimate work.

Approved enterprise AI

Approved enterprise AI is a tool with a signed data processing agreement, contractual guarantees against training on customer inputs, SOC 2 Type II coverage, and administrative controls that the organization can actually configure. ChatGPT Enterprise with zero data retention enabled, Microsoft Copilot bound to an M365 tenant, and Google Workspace AI under an enterprise agreement all qualify. Data entered into these tools stays under the organization's contractual control. This does not mean every use is automatically safe. It means the company has a place to configure controls, assign ownership, and define which data can be used.

Unmanaged SaaS AI

Unmanaged SaaS AI includes tools that employees use before security, legal, or IT has reviewed them. This may include niche coding tools, browser research tools, design tools, note-taking tools, and, actually, any existing SaaS platform that quietly adds AI features after purchase.

This is where visibility breaks down. A tool may look harmless, but still allow file uploads, prompt history, third-party processing, or access to workspace data. The risk is not limited to what employees type into the prompt box. Many AI tools are still applications, and they can collect data through app permissions, integrations, uploaded files, browser access, connected workspaces, and usage telemetry. Security teams should review unmanaged AI tools as software with data access, not only as chat interfaces.

Personal AI accounts

This is the employee using a personal AI subscription on a corporate device or using a free AI tool with a personal email address. The employer has no contractual relationship with the vendor governing that account, no visibility into conversation history, and no ability to enforce data retention settings. The underlying tool may be identical to the enterprise version, but the data handling is completely different.

Locally-hosted AI

A fourth tier is emerging: locally hosted AI tools running on an employee’s machine or other on-premises hardware. These reduce the vendor data-handling problem because prompts may stay inside the local environment. They introduce a different set of considerations: model storage on the device SSD, endpoint performance, access control, and what happens to conversation data when a device is reassigned or decommissioned.

Detecting which tier employees are actually in requires proxy or CASB visibility with session-level context—not just knowing that traffic is going to openai.com, but whether it is authenticated against a corporate workspace or a personal account.

Classify the data, not only the tool

A prompt governance model should classify the content employees enter into AI systems. Instructions like “do not share confidential data” are too vague for real work, where everything can feel confidential and nothing feels clearly classified. The policy needs to name the data types and, where possible, show concrete examples of risky use. At the same time, the model should be simple enough for employees to understand and precise enough for DLP, proxy rules, vendor review, and incident response.

Credentials and secrets have no legitimate reason to appear in any external AI tool. This includes API keys, OAuth tokens, session cookies, or private keys. A developer debugging a build failure does not need to paste the .env file. They need to paste the error. Replacing secrets with placeholders before asking for help is the prompt hygiene practice with the highest ROI.

Source code carries different risks depending on what it reveals. A small generic function is different from a proprietary fraud model, a trading algorithm, or an unreleased feature. Risk may increase further when code includes internal design comments or private endpoints.

Customer and employee data should be treated as sensitive prompt content even when a single prompt looks harmless. Emails, health details, payroll numbers, and account histories all apply. Partial details can still identify a person. Rewriting a customer response with AI can be valid, but it should happen in an approved tool with matching data handling terms, not a personal account.

Legal, financial, and board material is high risk because it is writing-heavy. Employees paste parts of contracts, acquisition plans, audit findings, and pricing strategy into AI tools because AI is useful for dense editing work. The policy should clearly state that summarizing a sensitive document with AI still constitutes sharing that document with the tool.

Security incident data needs separate handling. Logs, malware samples, endpoint telemetry, vulnerability details, and incident timelines can expose infrastructure weaknesses. Security teams can use AI, but the workflow should be in place before the incident.

Map prompt risk

Once risky data types are defined, employees still need a decision model they can use during real work. The simplest model is to classify prompt content by where it is allowed to go.

Restricted data should not enter external AI systems unless the organization has approved a specific controlled environment and accepted the risk. This includes credentials, secrets, payment card data, highly sensitive personal information, material from active litigation, pending transaction details, unreleased financial results, and source code containing secrets or critical business logic. The issue is immediacy: exposure can create legal, security, or business harm before the company has any practical way to recover.

Sensitive data may be used in approved enterprise AI when retention controls, access controls, and logging match the use case. It should stay out of unmanaged SaaS AI and personal accounts. This tier covers confidential business communications, customer context, internal architecture, unreleased product plans, operational reports, HR material, and private code without secrets. The risk is often competitive, contractual, reputational, or operational.

Internal data can be used in approved enterprise AI and sometimes in unmanaged tools when identifiers and strategic details are removed. Draft policies, sanitized meeting summaries, generic training material, and general code examples may fit here.

Public data should remain low-friction. Employees need freedom to use AI for public research, documentation, learning, and generic writing tasks, or the policy will be ignored.

Build the policy around the approved path

The most common failure mode in AI governance is a policy that tells employees what they cannot do but gives them no workable alternative. A blanket ban often pushes the same behavior onto personal devices or personal accounts, where the organization has even less visibility.

A useful policy answers the question employees actually have: “How can I do this safely?” Code assistance may require an enterprise coding assistant or a review tool with repository controls. Document drafting may require an enterprise AI workspace with clear classification rules. Public research can stay lower-friction as long as employees do not upload files or paste internal content.

The policy should also explain which account type to use, what content to remove first, what to do after an accidental paste, and how to request a new AI tool or use case. Good prompt governance reduces unsafe work by making safe work easier.

Detection: several different signal sources

Classification only works if something enforces it. In practice, many employees will not check a policy before pasting text into an AI tool. They are rushing to finish a ticket or to summarize a meeting. Automated detection has to assume speed, pressure, and mistakes.

The first signal source is browser and session visibility. Many AI tools run through ordinary browser workflows, so security teams should use browser security platforms, secure web gateways, proxy/DNS logs, and CASB data to understand actual use. The goal is not only to see traffic to an AI domain. Security teams need session context. That context determines whether the same prompt is acceptable or risky.

The second source is browser-based DLP. A managed browser profile or extension can inspect clipboard content at the moment of paste, before data leaves the endpoint. This is useful when TLS inspection is incomplete or when the risk happens inside an encrypted browser session.

The third source is proxy and CASB inspection. When TLS inspection is properly configured, these controls can apply content rules to AI requests and enforce tier-level routing. For example, they can allow enterprise AI tenants, warn on unmanaged tools, block consumer accounts, or stop risky file uploads to unapproved services.

The fourth source is endpoint and developer tool visibility. Browser controls do not cover IDE extensions, terminal tools, local agents, or plugins that can read files directly. These tools should be reviewed like any developer tool with access to repositories, configuration files, and environment data.

The fifth source is behavioral logging. Logs do not block risky prompts in real time, but they reveal adoption patterns, unusual upload behavior, personal account use, and the introduction of new AI tools into the environment.

Detection should therefore work as a layered system. Browser controls catch risky paste events early. DLP flags likely sensitive content. Proxy and CASB controls enforce approved paths. Endpoint and developer telemetry cover AI tools outside the browser. Logs show patterns that single alerts miss. Together, these signals turn prompt governance from a policy document into an operating control.

Connect prompt governance to existing security programs

Prompt data governance should not become a separate security island. It should extend the security programs the company already runs.

Security awareness training should use job-specific examples. Engineers need to see how a build log can expose a token. Legal teams need to understand why rewriting a contract in a personal AI account still constitutes external processing. The task may be legitimate. The risk often lies in the data included.

Incident response should also cover prompt leaks. The playbook should establish what was shared, which tool and account were used, whether deletion is possible, whether secrets need rotation, and whether legal or privacy review is required. The goal is fast damage reduction.

Vendor review should treat AI tools like SaaS tools with extra questions. Security and legal teams need to know whether prompts are used for training, how long data is retained, whether deletion is possible, which subprocessors are involved, and how enterprise account terms differ from consumer terms.

AI governance should keep an inventory of approved tools, business owners, allowed use cases, exceptions, and reassessment dates. The governance group should not approve every prompt. It should define when a review is needed and what evidence teams must provide before sensitive data can be used.

Conclusion: what good looks like after 90 days

A realistic prompt governance program should reduce the largest blind spots first. In the first 30 days, identify AI tools in use, separate enterprise tools from unmanaged and personal accounts, and publish a short policy for restricted data and approved alternatives.

By day 60, tune browser, proxy, and DLP controls for high-confidence risks such as secrets, regulated data, sensitive source code, and uploads to unmanaged tools. Add a simple intake path for new use cases.

By day 90, connect the program to vendor review, AI governance, training, and incident response. Track adoption, risky prompts, unmanaged use, exceptions, and incidents.

View full post