Cleanup & Data Management
Automated data lifecycle management with cleanup, archival, and retention policies to optimize storage and maintain compliance
Overview
Cleanup & Data Management controls the lifecycle of data stored within a workspace: sessions, source documents, and filled forms. Retention periods are configurable per workspace. When a user or admin deletes a source document or filled form, the item enters a 30-day soft-delete (trash) period before permanent deletion. After the trash period elapses, the file is permanently deleted from Azure Blob Storage via utils/azure.py in the Python service.
Stateless mode is an option for workspaces that require source documents to be deleted immediately on session completion, rather than retained. When stateless mode is active, source documents are removed from Azure Blob Storage as soon as the fill session ends — no 30-day window. Text-based source content is encrypted at rest using handle_text_encryption(), implemented with PyCryptodome 3.19.0, with encryption keys stored in Azure Key Vault. GDPR right-to-erasure requests trigger deletion of user data (sessions, sources, filled forms, profile) propagated across the requesting workspaceId. All cleanup and deletion operations are scoped to workspaceId and enforced by the JWT authentication middleware in both the .NET and Python service layers.
Key Capabilities
- Configurable Retention Policies: Set retention periods per workspace for sessions, source documents, and filled forms independently
- Soft Delete with 30-Day Trash: Sources and filled forms enter a trash period before permanent deletion; items can be restored within 30 days
- Stateless Mode: Source documents deleted immediately on session completion; no post-session retention window
- Azure Blob Storage Deletion: Permanent deletion executed via
utils/azure.pyin the Python service - Text Source Encryption:
handle_text_encryption()using PyCryptodome 3.19.0; keys stored in Azure Key Vault - GDPR Right-to-Erasure: Deletion request propagates across sessions, sources, filled forms, and profile for the requesting
workspaceId - Workspace Isolation: All cleanup operations scoped to
workspaceId; JWT middleware enforced in both service layers - Audit Trail: Deletion activities logged for compliance reference
How It Works
Retention Policy Configuration
Retention periods are configured at the workspace level. Separate policies apply to each data category:
Form Fill Sessions:
- Configurable retention period (e.g., 90 days, 1 year, 7 years, or custom)
- After the retention period elapses, sessions are permanently deleted
Source Documents:
- Standard mode: documents enter a 30-day trash period on deletion, then are permanently removed from Azure Blob Storage via
utils/azure.py - Stateless mode: documents are deleted from Azure Blob Storage immediately when the fill session completes — no trash period
Filled Forms:
- Configurable retention period matching organizational or regulatory requirements
- Deleted forms enter the 30-day trash period before permanent deletion
Soft Delete and Trash Period
When a user or admin deletes a source document or filled form, the item is marked as deleted in the database but not immediately removed from Azure Blob Storage. The item is visible in the Trash view and can be restored at any time within the 30-day window. After 30 days, the permanent deletion job runs: the Python service calls utils/azure.py to remove the blob from Azure Blob Storage, and the database record is purged.
Items permanently deleted from Trash by an admin skip the remaining trash window and are deleted from Azure Blob Storage immediately.
Stateless Mode
When stateless mode is enabled for a workspace (as described in the autofill-from-sources feature), source documents are not retained after the fill session ends. On session completion, the Python service deletes the source document from Azure Blob Storage immediately via utils/azure.py. This mode is appropriate for workspaces with strict data minimization requirements where retaining uploaded source documents beyond the session is not acceptable.
Text Source Encryption
Text-based source content (text pasted or submitted as a source rather than uploaded as a file) is encrypted at rest using handle_text_encryption() in the Python service. The implementation uses PyCryptodome 3.19.0. Encryption keys are stored in Azure Key Vault and are not embedded in the application code or database. Decryption occurs at read time within the Python service when the content is needed for a fill session.
GDPR Right-to-Erasure
When a GDPR right-to-erasure request is received for a user, the deletion is propagated across all data associated with that user's workspaceId: fill sessions, source documents (both files in Azure Blob Storage and database records), filled forms, and the user's profile. Deletion is executed via the same utils/azure.py path used for standard cleanup, scoped strictly to the requesting workspaceId. Data belonging to other workspaces is not affected. Exceptions apply where legal retention obligations (e.g., tax records) require the data to be retained; in those cases, the user is informed and the data is retained for the legally required period only.
Use Cases
Data lifecycle policies are used to meet regulatory requirements and manage workspace storage. Healthcare workspaces configure stateless mode to ensure uploaded patient documents are not retained beyond the session, satisfying data minimization requirements. Organizations subject to GDPR use the right-to-erasure flow to delete a departed user's data across sessions, sources, and profiles within the required 30-day response window. Storage-constrained workspaces set short retention periods for sessions and source documents to reclaim quota. Law firms or financial services workspaces that must retain records for 7 years configure the session and filled-form retention policies accordingly and rely on audit logs to demonstrate compliance.
Benefits
- Retention Is Per-Workspace: Policies for sessions, source documents, and filled forms are configured independently, allowing different retention periods for different data categories within the same platform
- Stateless Mode Eliminates Post-Session Source Retention: Source documents are deleted from Azure Blob Storage at session completion, with no 30-day window — appropriate for strict data minimization requirements
- 30-Day Trash Prevents Accidental Loss: Soft-delete gives users and admins a recovery window before blobs are permanently removed from Azure Blob Storage
- Encryption at Rest for Text Sources:
handle_text_encryption()with PyCryptodome 3.19.0 and Azure Key Vault key management protects text source content without application-layer key exposure - GDPR Erasure Is Scoped and Propagated: A single erasure request removes the user's data across all relevant collections for their
workspaceId, with no impact on other workspaces - Workspace Isolation: JWT middleware in both service layers ensures cleanup operations cannot cross workspace boundaries
Security & Privacy
Data is scoped to workspaceId and protected via the shared JWT authentication middleware running in both the .NET and Python service layers. Files in Azure Blob Storage are deleted via utils/azure.py — no direct blob access is available to end users. Text source content is encrypted at rest using PyCryptodome 3.19.0 via handle_text_encryption(); keys are stored in Azure Key Vault. GDPR erasure requests propagate deletion across sessions, sources, filled forms, and profiles for the requesting workspaceId only — other workspaces are not affected. Audit logs record all deletion activities. The 30-day soft-delete period provides a recovery window but does not retain data beyond the configured workspace retention policy.
Common Questions
What happens if I accidentally delete something?
Standard deletions (sources, filled forms) enter a 30-day soft-delete (trash) period. During this window, the item remains in Azure Blob Storage and can be restored via the Trash view. After 30 days, the Python service calls utils/azure.py to permanently remove the blob and purge the database record — at that point, the item is unrecoverable via the application.
If an admin permanently deletes an item directly from Trash, the blob is removed from Azure Blob Storage immediately without waiting for the 30-day window.
Stateless mode does not apply a trash period to source documents — they are deleted from Azure Blob Storage immediately on session completion and cannot be restored.
Can I customize retention policies per form type?
Retention policies are configured at the workspace level for each data category (sessions, source documents, filled forms). Separate policies for different form types within the same workspace are not currently supported as a native configuration option — the workspace-level policy applies to all items in that category. Organizations that require different retention periods for different form types typically use separate workspaces with separate retention configurations. The stateless mode flag is a workspace-level setting that applies uniformly to all source documents in that workspace.
How does stateless mode differ from the standard 30-day trash?
In standard mode, a source document deleted by a user enters a 30-day trash period. The blob remains in Azure Blob Storage during this window and can be restored. After 30 days, utils/azure.py permanently removes the blob.
In stateless mode, source documents are deleted from Azure Blob Storage via utils/azure.py immediately when the fill session completes — not when a user manually deletes them, and not after a 30-day window. The document is gone as soon as the session ends. This mode is intended for workspaces where retaining uploaded source documents beyond the session would violate data minimization policies.
What about GDPR "Right to Erasure"?
When a GDPR right-to-erasure request is processed for a user, deletion propagates across all data associated with that user's workspaceId: fill sessions, source documents (blobs deleted via utils/azure.py, database records purged), filled forms, and the user's profile. The deletion is scoped strictly to the requesting workspaceId — data belonging to other users or workspaces is not affected.
Where legal retention obligations apply (e.g., tax records that must be retained for a statutory period), the data is retained for that period only and the user is informed. After the legal retention period expires, the data is deleted as part of the workspace's standard cleanup cycle.
Audit logs of the erasure action are retained to demonstrate compliance.
How is text source content protected at rest?
Text pasted or submitted as a source (rather than uploaded as a file) is encrypted at rest via handle_text_encryption() in the Python service. The implementation uses PyCryptodome 3.19.0. Encryption keys are stored in Azure Key Vault and are not embedded in application code or the database. Decryption occurs within the Python service at read time when the content is needed for a fill session. File-based source documents (PDFs, images) are stored in Azure Blob Storage, which encrypts data at rest using Azure-managed keys.