Data privacy and legal considerations

Semgrep Multimodal uses API permissions to access code in your selected GitHub or GitLab repositories. To provide AI-powered functionality, portions of the source code are processed by Semgrep's AI model vendors.

Semgrep Multimodal’s data privacy and legal considerations apply across the following AI-assisted features, which build on one another:

Triage and remediation of findings
Memories: Adds reusable Memories to enhance triage and remediation.
AI-powered detection scans: Adds AI-driven vulnerability detection on top of triage, remediation, and Memories.

Overview of data flow

When using Semgrep AI features:

Semgrep accesses repository code on a file-by-file basis. In limited cases, broader repository access may be required.
Relevant data, including portions of source code, are sent outside your repository to AI subprocessors for analysis. Semgrep supports AI subprocessors from the following model vendors:
- OpenAI (default)
- Amazon Bedrock (default)
- Anthropic (BYOK; not available for AI-powered detection scans)
- Azure OpenAI (BYOK; not available for AI-powered detection scans)
- Google Gemini (BYOK; not available for AI-powered detection scans)
- xAI Grok (BYOK; not available for AI-powered detection scans)
AI subprocessors return results to Semgrep.
Semgrep stores limited data to support product functionality, in accordance with its data retention policies.

Data sent to AI subprocessors

The type and amount of data sent to AI subprocessors depend on the feature being used. The following table summarizes the data transmitted to AI subprocessors:

Feature	Data sent to AI
Triage and remediation	Code associated with a finding and the minimal surrounding context
Memories	Code associated with a finding, minimal surrounding context, and user-provided Memory content, including code snippets
AI-powered detection scans	Full file contents, uploaded context documentation, and scan-related metadata

Semgrep does not intentionally send personal data to AI subprocessors.

Data retention and storage by Semgrep's AI vendors

All Semgrep AI subprocessors operate under zero data retention agreements by default. Under the zero data retention agreements, AI subprocessors do not retain or use your data to train their models.

Semgrep maintains Data Protection Agreements (DPAs) with all AI subprocessors. Customers can request these through the Semgrep trust portal. Alternatively, contact your Semgrep account manager to request copies of these DPAs.

Data retention and storage by Semgrep

Semgrep stores a limited amount of your data to support product functionality and performance evaluation. The type of data stored depends on the feature being used.

Triage and remediation

Data stored may include:

AI prompts and responses
Code snippets associated with findings
Minimal surrounding context required for accurate results

Memories

Includes all data from triage and remediation, and adds:

User-defined Memory content that is stored as text
Code snippets included in Memories

Semgrep does not retrieve or access external data sources referenced in Memories.

AI-powered detection scans

Data stored may include:

AI prompts and responses
Code snippets and, where required, full file contents
Uploaded context documentation
Scan reports, including metadata such as file names and, in some cases, code snippets

Uploaded context documentation and scan reports are persistently stored in a Semgrep-managed Amazon S3 bucket. Context documentation may be reused across scans.

Retention period

Stored data is retained for up to 6 months, unless otherwise noted. Semgrep will provide at least 30 days’ notice before making changes to retention policies.

Purpose of storage

Stored data is used to:

Provide Semgrep Multimodal functionality
Enable access to prior results, for example, to provide remediation guidance
Support internal performance evaluation
Support troubleshooting and debugging

Data handling and protections

Customer data is logically isolated and never commingled across tenants
Semgrep does not intentionally send personal data to AI subprocessors
Semgrep and its subprocessors do not obtain ownership rights to your source code
Data sent to AI subprocessors is deleted after processing in accordance with zero data retention agreements

Minimal data retention policy (optional)

If you want to further limit data retention for Semgrep Multimodal, you can contact support to enroll in the minimal data retention policy.

AI responses are still stored as required to provide functionality.

When should I use this?
Use this policy if your organization requires stricter data handling controls and reduced persistence of code and prompts within Semgrep systems, where possible.

Key differences from default behavior

When the minimal data retention policy is enabled:

AI prompts and code are not logged or captured by observability tools
Data is not stored in external storage systems, such as Amazon S3
Stored data is limited to what is strictly required to provide functionality

The table below compares default data handling with the minimal data retention policy. All stored data is handled by Semgrep or Semgrep-managed systems, not its AI vendors.

Category	Default behavior	Minimal data retention policy
AI prompts and code	May be logged and stored to support functionality and performance evaluation	Not logged or captured by observability tools; not stored in external systems, and persistence is limited to what is required for functionality
AI responses	Stored for up to 6 months to provide functionality and access to prior results	Stored only as required to provide functionality, with reduced persistence
Memories	Stored within Semgrep-managed systems and may include user-provided content and small code snippets	No change
AI-powered detection data	Full file analysis, context documentation, and scan reports may be stored and reused	No change
External storage (Semgrep-managed S3)	Used for certain data	Not used, except for AI-powered detection context documents and scan reports
Data deletion	Available upon request	Available upon request

Exceptions for AI-powered detection scans

Under the minimal data retention policy, the following behavior remains unchanged for AI-powered detection scans:

If you upload context documentation to enhance AI-powered detection scans, these files are persistently stored in a Semgrep-managed Amazon S3 bucket to enable reuse across future AI-powered detection scans.
AI-powered scan reports are stored in a Semgrep-managed Amazon S3 bucket. These reports may contain metadata such as file names and, in some cases, code snippets included in issue descriptions.

Responses from Semgrep's AI model vendors are stored in the Semgrep database solely for providing Multimodal functionality. For instance, AI-generated remediation advice is stored so users can access it in the Semgrep AppSec Platform. However, code snippets are never retained to improve future prompts.

Any stored data can be deleted upon customer request. Semgrep, Inc. will provide at least 30 days' notice before making any changes to the retention policy.

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.

Overview of data flow​

Data sent to AI subprocessors​

Data retention and storage by Semgrep's AI vendors​

Data retention and storage by Semgrep​

Triage and remediation​

Memories​

AI-powered detection scans​

Retention period​

Purpose of storage​

Data handling and protections​

Minimal data retention policy (optional)​

Key differences from default behavior​

Exceptions for AI-powered detection scans​