Site icon Liam Cleary [MVP Alumni and MCT]

Baseline Your Data Protection Platform for Copilot

a close up shot of paper clipped documents

Photo by Kindel Media on Pexels.com

Deploying Copilot for Microsoft 365 without first establishing a solid data protection baseline is one of the fastest ways to create unintentional exposure. Copilot is powerful, fast, and capable of pulling together insights from across your Microsoft 365 estate. But it does this strictly through the permissions and governance you already have in place. If you have inconsistent labels, weak data classification, limited auditing, or a messy information architecture, Copilot will immediately reveal those weaknesses. Baseline data protection is the controlling force that keeps Copilot aligned with your compliance requirements and your organizational risk posture. This step ensures that before AI begins summarizing, correlating, or generating content, you’ve tightened the rules for how your data should be accessed, protected, and monitored.


Establish a Unified, Trustworthy Sensitivity Label Taxonomy

Sensitivity labels are the core mechanism that informs Copilot what content can be viewed, summarized, extracted, or restricted. If your taxonomy is bloated, unclear, or inconsistently applied, Copilot will reflect that confusion. A strong baseline begins with a clean, intentional label hierarchy used universally across your tenant. Rather than piling on dozens of labels, the goal is to create a small, predictable set that can be easily understood by humans and honored faithfully by the AI.

A typical Copilot-ready taxonomy might include labels such as “Public,” “Internal,” “Confidential,” “Highly Confidential,” and “Restricted.” The practical difference is not the name; it’s the permissions behind the label, especially the distinction between VIEW rights and EXTRACT rights. Copilot relies on EXTRACT rights to generate summaries or reinterpretations of content. When EXTRACT rights are removed, the user may still open the file, but Copilot cannot interact with it. This becomes crucial for content you never want to pass through the AI pipeline, such as legal hold material, executive board documents, certain financial reports, or private HR records.

Label policies must be applied intentionally. Some departments may require more flexible data interaction capabilities, Finance, for example, might allow Copilot to summarize financial models internally, while others, such as Legal or HR, may require non-negotiable restrictions. What matters most is that your taxonomy is predictable, consistently enforced, and structured around actual business needs rather than hypothetical scenarios. A stable sensitivity label hierarchy is one of the most important prerequisites for Copilot adoption.

To support this structure, the following table provides a valid, Microsoft-supported baseline set of sensitivity labels, each using only real controls available in Microsoft Purview today. This table outlines encryption behavior, permissions, and the resulting effect on Copilot’s ability to read or summarize content.

Label NamePurpose / DescriptionEncryptionAllowed PermissionsEffect on Copilot
PublicContent intended for public or widely visible sharing.OffN/A (no encryption)Fully accessible; Copilot can read and summarize.
InternalDefault internal business content.Off (or On without restrictions)View, Edit, Copy, Export, PrintCopilot can read and summarize normally.
ConfidentialSensitive organizational content requiring protection.OnView, Edit, Copy, Export, PrintCopilot can view and summarize content securely.
Highly ConfidentialCritical information requiring strict access limitations.OnView only (No Copy, No Export; Print optional)Copilot cannot summarize or extract content because Copy/Export are disabled.
RestrictedHigh-risk or regulated data with the most stringent controls.On (assigned to specific groups)View only (No Edit, No Copy, No Export, No Print)Copilot cannot read, reference, or summarize. Full AI restriction.
Finance ConfidentialFinancial statements, forecasting, budgeting.On (scoped to Finance group)View, Edit, Copy, ExportCopilot fully available to Finance users only.
Legal PrivilegedAttorney-client privileged documents.OnView only (No Copy, No Export)Copilot is blocked; summarization prevented by permissions.
HR SensitiveEmployee data, performance, compensation.On (HR group only)View, Edit (No Copy, No Export)Copilot can help with drafting but cannot summarize or extract.
Project SensitiveR&D, M&A, confidential product work.On (dynamic group)View, Edit, Copy, Export (project members only)Copilot available to authorized project members.
Executive Board ConfidentialBoard packets, strategy discussions, critical reviews.On (Exec group only)View only (No Copy, No Export, No Print)Copilot fully blocked, protecting executive material.

Disclaimer: The sensitivity label behaviors and Copilot access outcomes described in the table are based on Microsoft’s documented enforcement model for Microsoft Information Protection (MIP), including encrypted content handling, usage rights (such as View, Copy, Export, and Print), and the principle that Copilot operates strictly within the user’s existing permissions. While Microsoft does not explicitly state Copilot behavior for every individual permission scenario, such as “Copy/Export restrictions directly prevent summarization,” the examples presented here are based on the way the MIP SDK, encryption policies, usage rights, and Graph API content access controls function together in the Microsoft 365 ecosystem. Organizations should validate these configurations in their own environments, noting that Copilot’s behavior aligns with the underlying permissions and protection rules applied through Microsoft Purview, rather than through Copilot-specific policy settings.


Use Auto-Labeling to Guarantee Full Coverage Across the Tenant

Manual classification is never enough. Users forget, don’t understand labels, or misclassify content. Auto-labeling ensures your governance applies universally, even across years of legacy data. The goal is not to replace human decisions but to protect the organization from gaps created by human error.

Auto-labeling should be configured to detect easily identifiable sensitive information such as financial data, customer identifiers, personally identifiable information, regulated industry terms, and other key patterns. When these patterns appear, Purview can automatically elevate or assign the correct label. This is especially important for older SharePoint libraries and OneDrive folders where unlabeled or incorrectly labeled files would otherwise slip through Copilot’s visibility filters.

Simulation mode is a practical starting point. It reveals labeling patterns without changing any content, allowing you to tune detection before enforcement. Once refined, enabling auto-labeling across SharePoint and OneDrive ensures Copilot interacts with a dataset that reflects your intended protection strategy.


Configure Encryption Correctly to Control AI Interaction

Encryption, when tied to sensitivity labels, becomes the most precise tool for controlling whether Copilot can interpret content. The interplay is straightforward: if a user has EXTRACT rights under an encrypted label, Copilot can summarize that content. If EXTRACT is denied, Copilot cannot operate—even if the user can view the file.

A well-designed encryption strategy lets you allow Copilot to work with most organizational content while protecting your highest-risk material. Typically, “Confidential” content remains accessible, while “Highly Confidential” or “Restricted” content becomes off-limits by removing EXTRACT permissions.

Below is a recommended table of encryption configurations aligned with Copilot behavior:

Label NameEncryption SettingUsage Rights (User / Copilot)Copilot BehaviorIntended Outcome
InternalNo encryptionVIEW + EXTRACT allowedCopilot fully accessibleNormal business workflows
ConfidentialEncryption enabledVIEW + EXTRACT allowedCopilot can summarize securelyBalanced productivity and security
Highly ConfidentialEncryption enabledVIEW allowed, EXTRACT deniedCopilot cannot summarizeProtects sensitive operations
RestrictedEncryption w/ strict accessVIEW restricted; EXTRACT removed entirelyCopilot fully blockedEnsures regulatory or legal data stays out of AI workflows
Department-ConfidentialEncrypted, scoped to departmentLIMITED EXTRACT for department members onlyCopilot works only for authorized usersSupports controlled AI within departments
Project-SensitiveEncryption with dynamic groupsEXTRACT only for project participantsCopilot aids project teams securelyEnables AI for time-limited initiatives

Disclaimer: The encryption configurations and Copilot interaction outcomes described here are based on Microsoft’s documented behavior for Microsoft Information Protection (MIP) sensitivity labels, encryption enforcement, and usage rights such as View, Copy, Export, and Print. Microsoft does not explicitly document Copilot-specific responses to each usage right; however, Copilot relies on the exact underlying access mechanisms and MIP enforcement controls as other Microsoft 365 applications. When encryption prevents applications from extracting or exporting protected content, Copilot is likewise unable to read or summarize it. The described outcomes—including scenarios where removing Copy/Export rights prevents AI summarization—are therefore inferred from the established MIP encryption model rather than stated as Copilot-specific rules. Organizations should validate these configurations in their own tenant to confirm that AI interactions align with their intended sensitivity label design and encryption enforcement strategy.


Enable and Configure Audit Logging for Transparency and Accountability

Copilot introduces new forms of data access: summaries, correlations, generated insights, and cross-workload retrieval, that require a robust auditing foundation. Microsoft Purview Audit, whether on the standard or premium tier, provides visibility into how Copilot interacts with your content. Without it, Copilot’s activities become opaque, leaving security teams blind to how AI-assisted workflows influence data access and movement across the tenant.

Audit logs capture and surface key events such as:

This visibility becomes essential for investigations. If a user claims that Copilot surfaced content they did not expect, or if sensitive information appears in a generated output, the audit trail provides the historical record needed to understand what happened and why. This level of transparency is fundamental in regulated sectors, where AI-assisted content handling may be reviewed by internal compliance teams, external auditors, or legal entities.

Retention policies must align with your operational and regulatory requirements. Many organizations function effectively with:

The goal is simple: maintain an audit trail that ensures every Copilot interaction involving sensitive data remains discoverable long after the event. A well-configured audit environment doesn’t just support Copilot; it reinforces trust, accountability, and responsible AI adoption across the entire Microsoft 365 ecosystem.


Prepare eDiscovery and Communication Compliance for AI Workflows

Copilot introduces a new category of content into your Microsoft 365 environment: AI-generated text, AI-assisted edits, rewritten summaries, suggested responses, and contextual references to existing documents and conversations. While Copilot does not store prompts or responses as separate artifacts in the tenant, the actions it performs and the content it interacts with can become subject to legal discovery, internal investigations, compliance reviews, or regulatory audits. This makes it essential that your eDiscovery environment is configured to identify, preserve, and export the materials that Copilot interacts with.

Your eDiscovery (Standard or Premium) configuration should be capable of locating:

Although Copilot does not create new “AI objects” inside the tenant, it influences how users interact with existing content. This means discovery must be able to reconstruct which content was accessed, when it was accessed, and who accessed it. Purview Audit Premium is especially important because it captures the detailed events required to rebuild the sequence of AI-driven activity. If an investigation requires proof that Copilot was used to produce or retrieve specific content, the audit logs function as the authoritative source.

Beyond discovery, organizations must also strengthen Communication Compliance policies to monitor the behavioral risks introduced by generative AI. Copilot enables employees to retrieve and transform large volumes of data quickly, which may expose patterns of misuse that did not exist before. Communication Compliance helps detect:

These policies are not designed to stop Copilot. They ensure that the way users engage with Copilot aligns with regulatory requirements, ethical expectations, and internal governance standards. As Copilot accelerates communication and content creation, it also accelerates the need for oversight, monitoring, and accountability.


Strengthen Your DLP Framework to Control Copilot Interactions

Data Loss Prevention (DLP) becomes significantly more important once AI enters the environment. Without DLP, Copilot may legitimately summarize content that is labeled correctly but still too sensitive to be shared or discussed broadly.

This is where targeted, Copilot-specific DLP becomes essential. Policies should protect your most sensitive classifications, particularly “Highly Confidential” and “Restricted”, by restricting or auditing AI interactions with those labels. DLP can surface warnings to users, block certain AI actions, or require justification before a sensitive summarization occurs. When combined with sensitivity labels and contextual conditions such as location, device, and user risk, DLP becomes a layered security model that ensures sensitive material remains under strict control even when used inside productivity workflows.

Below is a suggested set of DLP rules tailored specifically for Copilot. These are by any means all you need, they are just examples:

DLP Rule NamePurpose / ScenarioTrigger ConditionsAction / EnforcementOutcome for Copilot
DLP-Copilot-Block-Restricted-Data-UsagePrevent AI from interacting with Restricted/HC dataLabel = Restricted/HCBlock + High-Severity AlertCopilot cannot access or summarize data
DLP-Copilot-Warn-High-Risk-Confidential-AccessWarn users interacting with regulated dataConfidential + Sensitive info typesWarning + AlertAllows use but monitored
DLP-Copilot-Audit-Sensitive-SummariesTrack sensitive summariesConfidential onlyAuditVisibility without blocking
DLP-Copilot-Block-Sharing-Outside-DepartmentPrevent departmental IP leakageSharePoint/Teams dept sitesBlock cross-department shareAI cannot leak content across groups
DLP-Copilot-Block-External-SharingPrevent AI-generated content leaving tenantAny external share attemptBlock + NotifyEliminates external exposure
DLP-Copilot-Monitor-Bulk-Data-AccessDetect AI-triggered mass data aggregationBulk summarization patternAlert + MonitorIdentifies compromised accounts
DLP-Copilot-Block-Unlabeled-Sensitive-PatternsProtect unlabeled sensitive legacy dataSensitive info types w/ no labelBlock Copilot accessForces proper labeling

Disclaimer: The Data Loss Prevention (DLP) rules and enforcement outcomes described reflect Microsoft Purview’s documented capabilities, including auditing, policy tips, blocking actions, and automatic labeling. Microsoft does not provide Copilot-specific DLP actions; instead, DLP governs the underlying content access within SharePoint, OneDrive, Exchange, and Teams. The Copilot behaviors referenced, such as being unable to summarize or access restricted content, are inferred from the way DLP policies prevent users and applications from accessing or transmitting protected data. Because Copilot operates strictly within the user’s permissions and the platform’s data access controls, blocking an activity via DLP prevents Copilot from performing AI-driven actions on that content. Organizations should validate these rules within their own Purview environment to ensure they align with internal governance standards and real-world Copilot usage patterns.


Stabilize Your Data Governance Before Scaling Copilot

All the technical controls in the world won’t matter if your underlying data environment is chaotic. Copilot has no special override; it merely reflects your existing permissions. If users can see something today, Copilot can help them find it faster tomorrow. Many organizations carry years of oversharing, outdated content, abandoned sites, and misconfigured permissions accumulated from legacy collaboration patterns. Before introducing AI into this environment, you must establish a clean governance baseline.

A stable baseline includes ensuring:

This step is where governance meets practicality. It is not about achieving perfection across every site, library, or document. It’s about ensuring the data Copilot touches is structured, protected, intentional, and overseen by accountable owners. Establishing this baseline dramatically reduces the risk of accidental exposure when the AI begins connecting information across the tenant. It also gives you confidence that Copilot is amplifying the correct data, not the forgotten, misconfigured, or overshared data hiding in the shadows of your environment.


Closing Thoughts

A strong data protection baseline ensures that Copilot operates within the boundaries you intentionally define, not the accidental ones your environment has inherited over time. Copilot is not an elevated identity or a privileged system; it is an accelerator that amplifies whatever access your users and your underlying governance model already provide. This makes your data protection posture the single most important factor in determining whether Copilot becomes a controlled, enterprise-grade asset or an uncontrolled accelerant of preexisting risks.

Sensitivity labels, auto-labeling rules, encryption enforcement, audit logging, DLP policies, and strong governance collectively shape the perimeter that Copilot must operate within. Each component plays a distinct role in controlling how data flows, how it’s classified, how it’s protected, and ultimately how AI is permitted to interpret it.

When these controls function together, they create a layered protection framework that prevents Copilot from interacting with inappropriate content, strengthens your Zero Trust posture, and reduces the likelihood of accidental data exposure. More importantly, they allow Copilot to operate confidently within well-defined risk parameters, enabling your organization to harness AI’s value without compromising compliance, privacy, or security. This alignment between AI capability and data protection discipline is the foundation of safe, scalable Copilot adoption.

In the next post, we will take a deeper step into that foundation by addressing one of the most overlooked areas of Copilot readiness: Fixing Oversharing in SharePoint and OneDrive Before Copilot Deployment. This step is essential because Copilot will surface, correlate, and summarize data based on user access, not based on your intentions. If broad or unintended access exists today, Copilot will faithfully amplify it at machine speed. By cleaning up oversharing, restructuring permissions, and enforcing least-privilege principles, you eliminate latent data exposure risks before Copilot begins interpreting your content at scale.

The work ahead is not simply technical, it is transformational. But done correctly, it enables your organization to deploy Copilot with confidence, clarity, and control.

Exit mobile version