Source Management

Organize and reuse documents as data sources for efficient form filling

Overview

Source Management is the document repository for Instafill.ai form-filling sessions. Instead of re-uploading the same resume, client contract, or insurance card each time, you add it to the source library once and reference it across any number of sessions. When a session starts, library sources and profile files are async-copied into the session's source list before autofill_db_fields() runs — this is what makes sources instantly available to the concurrent field group extraction pipeline without repeated uploads.

Each source gets its own vector embeddings on upload, enabling page-scoped retrieval: the system maps source text to specific form page numbers so when filling a field group on page 3 of a credentialing packet, only text from pages relevant to that section reaches the AI. This page-level scoping maintains accuracy on long, multi-page forms — 1003 mortgage applications, government multi-part filings, and hospital credentialing packets — where passing the full source document to every field group would dilute the context with irrelevant data.

The library supports tagging, folder organization, version history, and usage tracking across sessions. Workspace-level sharing gives teams a single, maintained document set rather than individual uploads duplicated across accounts. Processing status for any session's sources is available at GET /api/sessions/{session_id}/process-sources-status.

Real-World Example: A legal firm automated email attachments by creating a source library of litigation documents. When attorneys receive discovery requests via email, the system automatically adds attachments to the source library and fills response forms.

Key Capabilities

  • Source Library: Centralized repository for frequently-used documents, scoped to workspace
  • Multiple Format Support: PDF, Word, Excel, images, text files, and plain text
  • Source Organization: Tag and folder organization for easy discovery
  • Vector Embeddings: Per-source embeddings enable page-scoped semantic retrieval during fill sessions
  • Version History: Track changes when updating source documents
  • Usage Tracking: See which sources are used in which sessions
  • Quick Access: Recently-used sources appear first for easy selection
  • Bulk Upload: Upload multiple source documents at once for batch processing
  • Source Sharing: Share sources with workspace members for team collaboration
  • Source Templates: Pre-configure common sources for new users or projects
  • OCR for Images: Google Cloud Vision API extracts text from scanned documents and photographed IDs
  • Source Metadata: Add descriptions, tags, and notes to sources

How It Works

  1. Add Sources to Library:

    • Upload documents from computer
    • Import from cloud storage (Google Drive, Dropbox)
    • Paste text directly
    • Capture via email forwarding
    • Add programmatically via RESTful API: POST /api/sources
  2. Processing on Upload:

    • Text extracted from PDF (PyMuPDF), Word, Excel, and images (Google Cloud Vision API with PIL/Pillow and OpenCV 4.9.0 pre-processing)
    • Vector embeddings created per source for semantic search and page-scoped retrieval
    • Source text mapped to specific form page numbers to scope context during fill
    • Text content encrypted via handle_text_encryption() (PyCryptodome 3.19.0) with workspace-scoped keys before storage
  3. Organize Sources:

    • Create folders: "Client Documents", "Reference Materials", "Templates"
    • Tag sources: "resume", "financial-info", "medical-history"
    • Add descriptions for team members
  4. Use in Form Sessions:

    • Start form filling session
    • Select "Add Source" → "From Library"
    • Sources are async-copied into the session source list before autofill begins
    • Poll GET /api/sessions/{session_id}/process-sources-status to confirm sources are ready
    • AI extracts data automatically using page-scoped vector retrieval for each field group

Real-World Example: Teams using n8n integration automatically pull source documents from Google Drive or CRMs when forms are triggered, eliminating manual uploads entirely.

  1. Update Sources: When information changes:

    • Upload updated version
    • System creates new version, regenerates vector embeddings for the new content
    • Form sessions use latest version automatically
    • In-progress sessions continue with the version they started with
  2. Share with Team:

    • Mark sources as "Shared" for workspace visibility
    • Team members reference common sources (company info, standard templates)
    • No duplicate uploads needed

Use Cases

Source management benefits teams that repeatedly fill similar forms from overlapping sets of documents. HR departments upload candidate CVs once and reuse them across multiple roles or interview rounds, medical practices maintain patient record sets applicable to any new form without re-uploading, and legal firms keep a central document library that any team member can draw on for filings and discovery responses.

Benefits

  • Time Savings: Upload documents once, use many times
  • Consistency: Same source used across multiple forms ensures data consistency
  • Version Control: Update sources without losing historical versions
  • Team Collaboration: Share sources across workspace for efficient teamwork
  • Reduced Errors: Referencing single source eliminates transcription errors
  • Easy Updates: Update source once, all future sessions use new data
  • Better Organization: Never lose important documents in email or folders

Security & Privacy

  • Access Control: All service-layer queries include a workspaceId filter — source data is isolated per workspace and protected via the shared JWT authentication middleware running in both the .NET and Python service layers.
  • Encryption: Text source content encrypted via handle_text_encryption() (PyCryptodome 3.19.0) with workspace-scoped keys stored in Azure Key Vault. Files stored in Azure Blob Storage via utils/azure.py.
  • Scope Restriction: Encryption includes scope metadata that prevents decryption outside the originating workspace, even for internal service calls.
  • Retention Policies: Configure automatic deletion after specified period
  • Audit Trail: Track who accessed or modified each source
  • Folder Permissions: Restrict sensitive sources to specific team members
  • Version Recovery: Restore previous versions if needed

Common Questions

How many sources can I store?

Storage limits depend on subscription plan:

  • Free Plan: 10 sources, 100 MB total storage
  • Starter Plan: 100 sources, 1 GB storage
  • Professional Plan: 1,000 sources, 10 GB storage
  • Enterprise Plan: Unlimited sources, custom storage quota

Tips for Managing Storage:

  • Delete unused sources periodically
  • Archive old sources (moved to cold storage, still accessible)
  • Use text paste instead of file upload when possible (much smaller)
  • Compress large PDFs before uploading
Can I share sources with people outside my workspace?

Source sharing options:

Within Workspace:

  • All workspace members see shared sources
  • Individual sources can be private (only you see them)

Within Organization (Different Workspaces):

  • Enterprise customers can enable cross-workspace sharing
  • Useful for organization-wide templates or company information

External Sharing:

  • Not directly supported (security risk)
  • Workaround: Export source, send to external party, they upload to their workspace
  • For clients: Create separate workspace and invite them as members

Public Sharing:

  • Not supported - sources are always private to workspace/organization
  • Protects sensitive information from accidental exposure
What happens to form sessions if I delete a source?

Source deletion handling:

Completed Sessions:

  • Already-filled forms retain their data
  • No impact on existing filled forms
  • Source data was extracted into field values during the session, not stored as a live reference

In-Progress Sessions:

  • If session hasn't extracted data yet: Loss of source
  • If session already extracted data: No impact
  • Recommendation: Complete sessions before deleting sources

Version History:

  • Deleting current version: Previous versions become current
  • Deleting all versions: Source permanently removed after 30-day trash period

Soft Delete:

  • Deleted sources move to Trash
  • 30-day recovery period
  • Permanent deletion after 30 days (or manual purge)

Best Practice: Archive instead of delete for sources that might be needed for audit purposes.

Can I automatically pull sources from external systems?

Yes, through integrations and automation:

Cloud Storage Integration:

  • Connect Google Drive, Dropbox, OneDrive
  • Sources sync automatically when files are added/updated
  • Bidirectional sync available

Email Integration:

  • Forward documents to unique workspace email address
  • System adds emailed documents to source library
  • Auto-tagging based on sender or subject

API Integration:

  • Programmatically add sources via RESTful API: POST /api/sources
  • Webhooks fire when sources are added or updated

Example Workflow:

  • New employee record created in HR system
  • HR system API pushes employee document package to Instafill source library
  • HR fills onboarding forms using auto-populated source library
  • No manual document upload needed

Contact support to discuss automation for your workflow.

How do I update a source without creating a new one?

Update existing sources to maintain consistency:

Replace Version:

  1. Open source in library
  2. Click "Upload New Version"
  3. Select updated file
  4. System creates new version while preserving previous versions
  5. Vector embeddings regenerated automatically for the new version

Edit Metadata:

  • Update source name, description, or tags without uploading new file
  • Useful for organizational changes

Version Comparison:

  • View differences between versions (text-based sources)
  • Restore previous version if needed

Impact on Sessions:

  • Future Sessions: Use new version automatically
  • In-Progress Sessions: Continue using the version they started with (prevents mid-session confusion)

Use Case: Employee updates resume with new certification. HR uploads new version to source library. Future job applications use updated resume, while in-progress application completes with original resume for consistency.

Version control ensures you always have access to historical data while benefiting from the latest information.

Related Features

Ready to get started?

Start automating your form filling process today with Instafill.ai

Try Instafill.ai View Pricing