Supported File Types

DSPM supports a broad set of file types for content extraction, classification, and scanning. This document lists all supported formats, configuration-file coverage, and unsupported source-code types.

Standard File Types Supported

Documents & Text Files

File Types	Notes
`.txt`, `.html`, `.xhtml`, `.xml`, `.pdf`, `.doc`, `.docx`, `.rtf`, `.odt`, `.epub`, `.pages`, `.wpt`, `.abw`, `.sxw`, `.wpd`	Fully supported for content extraction

Spreadsheets

File Types	Notes
`.xls`, `.xlsx`, `.csv`, `.tsv`, `.ods`, `.numbers`, `.xlsm`, `.xlt`, `.xltx`, `.sxc`	Fully supported

Presentations

File Types	Notes
`.ppt`, `.pptx`, `.odp`, `.key`, `.pps`, `.ppsx`, `.sxi`	Fully supported

Email Files

File Types	Notes
`.msg`, `.eml`, `.mbox`, `.pst`, `.ost`	Metadata + message body extraction supported

Structured & Markup Data

File Types	Notes
`.json`, `.yaml`, `.yml`, `.xml`, `.html`, `.xhtml`, `.rss`, `.atom`, `.svg`	Common for configuration files; fully supported

Images

File Types	Notes
`.jpg`, `.png`, `.gif`, `.tiff`, `.tif`, `.bmp`, `.webp`, `.ico`, `.psd`, `.svg`	OCR/extraction supported where applicable

Archives / Compressed Files

File Types	Notes
`.zip`, `.tar`, `.gz`, `.bz2`, `.7z`, `.jar`, `.rar`, `.xz`, `.lzma`, `.z`	DSPM recursively scans supported file types inside archives

Other / Dynamic Extensions

Description	Notes
Logs and other text-based dynamic extensions	Supported when text extraction is possible

Source Code File Support

DSPM scans configuration-oriented file types for secrets, credentials, and sensitive values. It does not perform full static code analysis.

Supported (Configuration-Oriented) Source Code Formats

File Type	Description
`.xml`	XML configuration/markup
`.yaml`, `.yml`	YAML configuration files
`.json`	JSON configuration and policy files
`.html`	HTML files
`.env`	Environment variable files

Source Code Formats Not Supported

These programming-language file types are not analyzed:

Language	File Types
Python	`.py`
JavaScript / TypeScript	`.js`, `.ts`
Java	`.java`
C / C++	`.c`, `.cpp`
Go	`.go`
Ruby	`.rb`
PHP	`.php`
React	`.jsx`, `.tsx`

Additional Notes

File support is based on Apache Tika extraction capabilities.
Archives are scanned recursively.
Unsupported binary formats may be ingested but not analyzed.