Content Scanning Partial Text Extraction

Partial Text Extraction is an additional capability in Content Scanning that allows the Agent to scan only partial text. With this option less content is scanned so less resources (Memory/CPU) are used and less time is required to scan large files. In addition, the number of failures due to large file size will be reduced, increasing the overall experience of the end user.

This option is turned on per Realm. It is an Advanced Option available when Enable Content Scanning is turned on. When Enable Partial Extraction from files is turned on, text is extracted from the beginning of the document up to the limit set. Only text is extracted, images are ignored.

  • When partial extraction is enabled: The scanner extracts text up to the configured partial extraction threshold (for example 150 MB) and continues policy evaluation based on the truncated content rather than automatically failing.

  • When partial extraction is disabled: The scanner attempts full extraction. If the extracted text exceeds the internal extraction threshold, the process stops with a write-limit (109) and triggers the configured fail-close threshold action.

Related Topic:

Content Scanning