Create a custom metadata label detector

This document describes how to configure Sensitive Data Protection to detect specific metadata labels in your content.

In this document, metadata labels are file labels that are embedded in supported files or metadata provided by your client application in the inspection request. If Sensitive Data Protection finds content that matches your metadata criteria, it generates a finding.

To scan for metadata labels, create a custom metadata label infoType. Then, configure your inspection or discovery scan to search for that infoType.

Benefits and use cases

This feature lets you use your existing classification taxonomies for content inspection and policy enforcement. If you use a custom or third-party classification system that applies metadata labels to your documents, you can configure Sensitive Data Protection to detect these metadata labels during your inspection or discovery operations.

Example use cases include the following:

  • Scan files for the presence of Google Drive labels or Microsoft sensitivity labels that contain specific metadata.
  • Combine metadata label detection with standard infoType detection for a multi-layered approach.
  • Scan metadata that your client application passes alongside the content, even if the metadata isn't embedded in the file. In this document, this type of metadata is called client-provided metadata.
  • Sanitize documents using Model Armor based on specific metadata labels. To use this feature with Model Armor—or services that use Model Armor like Gemini Enterprise—you must create an advanced Sensitive Data Protection configuration in Model Armor that references this custom metadata label detector.

Supported metadata formats

This feature can detect the following types of metadata:

  • Google Drive labels
  • Microsoft sensitivity labels on the following file types:

    • DOCX
    • PDF
    • PPTX
    • XLSX
  • Client-provided metadata

Unsupported configurations

Custom metadata label detectors aren't supported in the following:

Create a metadata label detector

To create a metadata label detector, define a CustomInfoType within an InspectConfig object. Depending on the type of metadata that you want to scan, define one of the following:

Create a metadata label detector to scan for file labels

To scan for Google Drive labels or Microsoft sensitivity labels, define a CustomInfoType that has a FileLabelInfoType:

{
  "inspectConfig": {
    "customInfoTypes": [
      {
        "infoType": {
          "name": "CUSTOM_METADATA_LABEL_NAME"
        },
        "likelihood": "LIKELIHOOD",
        "sensitivityScore": {
          "score": "SENSITIVITY_SCORE"
        },
        "fileLabelInfoType": {
          "googleDriveLabel": {
            "labelId": "LABEL_ID",
            "labelFieldsToMatch": [
              {
                "id": "FIELD_ID",
                "value": "FIELD_VALUE"
              }
            ]
          },
          "sensitivityLabel": {
            "guid": "GUID"
          }
        }
      }
    ]
  }
}

Replace the following:

  • CUSTOM_METADATA_LABEL_NAME: The name to assign to the custom infoType detector.
  • LIKELIHOOD: Optional. The Likelihood value to assign to all findings that match this custom infoType. If you omit this field, the default likelihood level is VERY_LIKELY.
  • SENSITIVITY_SCORE: Optional. The SensitivityScore to assign to all findings that match this custom infoType. If you omit this field, the default sensitivity score is HIGH.

    Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.

  • LABEL_ID: Optional. The ID of a Google Drive label to match. If you specify only a LABEL_ID, the detector matches any file that has that Google Drive label applied, regardless of its field values. To match specific field values within a label, include a labelFieldsToMatch object.

  • FIELD_ID: Required if you specify a labelFieldsToMatch object. The ID of a Google Drive label field to match.

  • FIELD_VALUE: Required if you specify a labelFieldsToMatch object. The value of a Google Drive label field to match.

  • GUID: Optional. The GUID of a Microsoft sensitivity label to match.

Example detector for a Google Drive label

This inspectConfig example defines a custom infoType named CUSTOM_GOOGLE_DRIVE_LABEL_CONFIDENTIAL. This custom infoType detects a Google Drive label that has the following:

  • The label ID mydrive-label-id
  • Within that label, a field with the ID sensitivity-field-id and the value confidential.

If you don't specify a field ID and value in labelFieldsToMatch, this custom infoType detects the files that have the mydrive-label-id label, regardless of the presence of any label fields.

{
  "inspectConfig": {
    "customInfoTypes": [
      {
        "infoType": {
          "name": "CUSTOM_GOOGLE_DRIVE_LABEL_CONFIDENTIAL"
        },
        "likelihood": "VERY_LIKELY",
        "fileLabelInfoType": {
          "googleDriveLabel": {
            "labelId": "mydrive-label-id",
            "labelFieldsToMatch": [
              {
                "id": "sensitivity-field-id",
                "value": "confidential"
              }
            ]
          }
        }
      }
    ],
    "minLikelihood": "POSSIBLE"
  }
}

When you use this configuration in an inspection job, Sensitive Data Protection generates a CUSTOM_GOOGLE_DRIVE_LABEL_CONFIDENTIAL finding if it finds a match.

Example detector for a Microsoft sensitivity label

This inspectConfig example defines a custom infoType named CUSTOM_SENSITIVITY_LABEL_CONFIDENTIAL. This custom infoType detects a Microsoft sensitivity label that contains the GUID 12345678-9012-3456-7890-123456789012:

{
  "inspectConfig": {
    "customInfoTypes": [
      {
        "infoType": {
          "name": "CUSTOM_SENSITIVITY_LABEL_CONFIDENTIAL"
        },
        "likelihood": "VERY_LIKELY",
        "fileLabelInfoType": {
          "sensitivityLabel": {
            "guid": "12345678-9012-3456-7890-123456789012"
          }
        }
      }
    ],
    "minLikelihood": "POSSIBLE"
  }
}

When you use this configuration in an inspection job, Sensitive Data Protection generates a CUSTOM_SENSITIVITY_LABEL_CONFIDENTIAL finding if it finds a match.

Create a metadata label detector to scan for custom key-value pairs

To use regular expressions to scan for custom key-value pairs in client-provided metadata, follow these steps:

  1. Define a CustomInfoType that has a MetadataKeyValueExpression:

    {
      "inspectConfig": {
        "customInfoTypes": [
          {
            "infoType": {
              "name": "CUSTOM_METADATA_LABEL_NAME"
            },
            "likelihood": "LIKELIHOOD",
            "sensitivityScore": {
              "score": "SENSITIVITY_SCORE"
            },
            "metadataKeyValueExpression": {
              "keyRegex": "KEY_REGULAR_EXPRESSION",
              "valueRegex": "VALUE_REGULAR_EXPRESSION"
            }
          }
        ]
      }
    }
    

    Replace the following:

    • CUSTOM_METADATA_LABEL_NAME: The name to assign to the custom infoType detector.
    • LIKELIHOOD: Optional. The Likelihood value to assign to all findings that match this custom infoType. If you omit this field, the default likelihood level is VERY_LIKELY.
    • SENSITIVITY_SCORE: Optional. The SensitivityScore to assign to all findings that match this custom infoType. If you omit this field, the default sensitivity score is HIGH.

      Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.

    • KEY_REGULAR_EXPRESSION: A regular expression to search for in the keys of metadata labels.

    • VALUE_REGULAR_EXPRESSION: A regular expression to search for in the values of metadata labels.

  2. In your content.inspect request, specify the client-provided metadata in the ContentMetadata field of the ContentItem.

Example request for scanning client-provided metadata

The following example shows a content.inspect request that includes both a PDF file and client-provided metadata. The request uses a custom infoType named CUSTOM_METADATA_CLASSIFICATION to scan for files that are marked as "Confidential" or "Internal Use".

{
  "inspectConfig": {
    "customInfoTypes": [
      {
        "infoType": {
          "name": "CUSTOM_METADATA_CLASSIFICATION"
        },
        "likelihood": "VERY_LIKELY",
        "metadataKeyValueExpression": {
          "keyRegex": "classification",
          "valueRegex": "Confidential|Internal Use"
        }
      }
    ]
  },
  "item": {
    "byteItem": {
      "type": "PDF",
      "data": "BASE64_ENCODED_PDF"
    },
    "contentMetadata": {
      "properties": [
        {
          "key": "classification",
          "value": "Confidential"
        }
      ]
    }
  }
}

Replace BASE64_ENCODED_PDF with a base64-encoded file to scan.

Sensitive Data Protection generates a finding if it finds a match in the content. The finding's MetadataType for MetadataLocation is populated based on whether the match is embedded in the file (CONTENT_METADATA) or provided by the client (CLIENT_PROVIDED_METADATA).

What's next