This document describes how to configure Sensitive Data Protection to detect specific metadata labels in your content.
In this document, metadata labels are file labels that are embedded in supported files or metadata provided by your client application in the inspection request. If Sensitive Data Protection finds content that matches your metadata criteria, it generates a finding.
To scan for metadata labels, create a custom metadata label infoType. Then, configure your inspection or discovery scan to search for that infoType.
Benefits and use cases
This feature lets you use your existing classification taxonomies for content inspection and policy enforcement. If you use a custom or third-party classification system that applies metadata labels to your documents, you can configure Sensitive Data Protection to detect these metadata labels during your inspection or discovery operations.
Example use cases include the following:
- Scan files for the presence of Google Drive labels or Microsoft sensitivity labels that contain specific metadata.
- Combine metadata label detection with standard infoType detection for a multi-layered approach.
- Scan metadata that your client application passes alongside the content, even if the metadata isn't embedded in the file. In this document, this type of metadata is called client-provided metadata.
- Sanitize documents using Model Armor based on specific metadata labels. To use this feature with Model Armor—or services that use Model Armor like Gemini Enterprise—you must create an advanced Sensitive Data Protection configuration in Model Armor that references this custom metadata label detector.
Supported metadata formats
This feature can detect the following types of metadata:
- Google Drive labels
Microsoft sensitivity labels on the following file types:
- DOCX
- PPTX
- XLSX
Client-provided metadata
Unsupported configurations
Custom metadata label detectors aren't supported in the following:
Create a metadata label detector
To create a metadata label detector, define a
CustomInfoType within an
InspectConfig object. Depending on the type
of metadata that you want to scan, define one of the following:
- To scan for Google Drive labels and Microsoft sensitivity labels, define a
FileLabelInfoType. For more information, see Create a metadata label detector to scan for file labels in this document. - To use regular expressions to scan for custom key-value pairs, define a
MetadataKeyValueExpression. This type is appropriate for client-provided metadata. For more information, see Create a metadata label detector to scan for key-value pairs in this document.
Create a metadata label detector to scan for file labels
To scan for Google Drive labels or Microsoft sensitivity labels, define a
CustomInfoType that has a
FileLabelInfoType:
{
"inspectConfig": {
"customInfoTypes": [
{
"infoType": {
"name": "CUSTOM_METADATA_LABEL_NAME"
},
"likelihood": "LIKELIHOOD",
"sensitivityScore": {
"score": "SENSITIVITY_SCORE"
},
"fileLabelInfoType": {
"googleDriveLabel": {
"labelId": "LABEL_ID",
"labelFieldsToMatch": [
{
"id": "FIELD_ID",
"value": "FIELD_VALUE"
}
]
},
"sensitivityLabel": {
"guid": "GUID"
}
}
}
]
}
}
Replace the following:
CUSTOM_METADATA_LABEL_NAME: The name to assign to the custom infoType detector.LIKELIHOOD: Optional. TheLikelihoodvalue to assign to all findings that match this custom infoType. If you omit this field, the default likelihood level isVERY_LIKELY.SENSITIVITY_SCORE: Optional. TheSensitivityScoreto assign to all findings that match this custom infoType. If you omit this field, the default sensitivity score isHIGH.Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.
LABEL_ID: Optional. The ID of a Google Drive label to match. If you specify only aLABEL_ID, the detector matches any file that has that Google Drive label applied, regardless of its field values. To match specific field values within a label, include alabelFieldsToMatchobject.FIELD_ID: Required if you specify alabelFieldsToMatchobject. The ID of a Google Drive label field to match.FIELD_VALUE: Required if you specify alabelFieldsToMatchobject. The value of a Google Drive label field to match.GUID: Optional. The GUID of a Microsoft sensitivity label to match.
Example detector for a Google Drive label
This inspectConfig example defines a custom infoType named
CUSTOM_GOOGLE_DRIVE_LABEL_CONFIDENTIAL. This custom infoType detects a
Google Drive label that has the following:
- The label ID
mydrive-label-id - Within that label, a field with the ID
sensitivity-field-idand the valueconfidential.
If you don't specify a field ID and value in labelFieldsToMatch, this custom
infoType detects the files that have the mydrive-label-id label, regardless of
the presence of any label fields.
{
"inspectConfig": {
"customInfoTypes": [
{
"infoType": {
"name": "CUSTOM_GOOGLE_DRIVE_LABEL_CONFIDENTIAL"
},
"likelihood": "VERY_LIKELY",
"fileLabelInfoType": {
"googleDriveLabel": {
"labelId": "mydrive-label-id",
"labelFieldsToMatch": [
{
"id": "sensitivity-field-id",
"value": "confidential"
}
]
}
}
}
],
"minLikelihood": "POSSIBLE"
}
}
When you use this configuration in an inspection job,
Sensitive Data Protection generates a
CUSTOM_GOOGLE_DRIVE_LABEL_CONFIDENTIAL finding if it finds a match.
Example detector for a Microsoft sensitivity label
This inspectConfig example defines a custom infoType named
CUSTOM_SENSITIVITY_LABEL_CONFIDENTIAL. This custom infoType detects a
Microsoft sensitivity label that contains the GUID
12345678-9012-3456-7890-123456789012:
{
"inspectConfig": {
"customInfoTypes": [
{
"infoType": {
"name": "CUSTOM_SENSITIVITY_LABEL_CONFIDENTIAL"
},
"likelihood": "VERY_LIKELY",
"fileLabelInfoType": {
"sensitivityLabel": {
"guid": "12345678-9012-3456-7890-123456789012"
}
}
}
],
"minLikelihood": "POSSIBLE"
}
}
When you use this configuration in an inspection job,
Sensitive Data Protection generates a CUSTOM_SENSITIVITY_LABEL_CONFIDENTIAL
finding if it finds a match.
Create a metadata label detector to scan for custom key-value pairs
To use regular expressions to scan for custom key-value pairs in client-provided metadata, follow these steps:
Define a
CustomInfoTypethat has aMetadataKeyValueExpression:{ "inspectConfig": { "customInfoTypes": [ { "infoType": { "name": "CUSTOM_METADATA_LABEL_NAME" }, "likelihood": "LIKELIHOOD", "sensitivityScore": { "score": "SENSITIVITY_SCORE" }, "metadataKeyValueExpression": { "keyRegex": "KEY_REGULAR_EXPRESSION", "valueRegex": "VALUE_REGULAR_EXPRESSION" } } ] } }Replace the following:
CUSTOM_METADATA_LABEL_NAME: The name to assign to the custom infoType detector.LIKELIHOOD: Optional. TheLikelihoodvalue to assign to all findings that match this custom infoType. If you omit this field, the default likelihood level isVERY_LIKELY.SENSITIVITY_SCORE: Optional. TheSensitivityScoreto assign to all findings that match this custom infoType. If you omit this field, the default sensitivity score isHIGH.Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.
KEY_REGULAR_EXPRESSION: A regular expression to search for in the keys of metadata labels.VALUE_REGULAR_EXPRESSION: A regular expression to search for in the values of metadata labels.
In your
content.inspectrequest, specify the client-provided metadata in theContentMetadatafield of theContentItem.
Example request for scanning client-provided metadata
The following example shows a content.inspect request that includes both a PDF
file and client-provided metadata. The request uses a custom infoType named
CUSTOM_METADATA_CLASSIFICATION to scan for files that are marked as
"Confidential" or "Internal Use".
{
"inspectConfig": {
"customInfoTypes": [
{
"infoType": {
"name": "CUSTOM_METADATA_CLASSIFICATION"
},
"likelihood": "VERY_LIKELY",
"metadataKeyValueExpression": {
"keyRegex": "classification",
"valueRegex": "Confidential|Internal Use"
}
}
]
},
"item": {
"byteItem": {
"type": "PDF",
"data": "BASE64_ENCODED_PDF"
},
"contentMetadata": {
"properties": [
{
"key": "classification",
"value": "Confidential"
}
]
}
}
}
Replace BASE64_ENCODED_PDF with a base64-encoded
file to scan.
Sensitive Data Protection generates a finding if it finds a match in the
content. The finding's
MetadataType for
MetadataLocation is
populated based on whether the match is embedded in the file
(CONTENT_METADATA) or provided by the client (CLIENT_PROVIDED_METADATA).