How Instafill Uses AI Models & Handles Your Data

Your documents power the fill — they don't train the model

Overview

Instafill.ai uses large language models (LLMs) to extract data from your source documents and map it to form fields. This means some of your document content is sent to AI providers as part of every form filling session. This page explains precisely what is sent, to which providers, under what terms, and what happens to it afterward.

The short answer to the most common concern: AI providers do not train on your data when accessed via API. Instafill uses API access — not the consumer products — which means your documents are processed for your request and not retained by the AI provider for training purposes, per the standard API terms of OpenAI and Microsoft.

What Data Is Sent to AI Providers

When you run autofill on a form, the AI receives:

Form structure data:

Field names and labels (e.g., "Applicant Full Name", "Date of Birth")
Field descriptions and instructions configured on the form
Field type metadata (text, checkbox, date, number)
The form page as a rendered screenshot (for vision-based tasks like field position detection and filled-example extraction)

Source document content:

Text extracted from your uploaded source documents (PDFs, Word files, images, CSV/Excel)
Your typed text input from the session's text field
For image/scanned sources: the image itself is sent to a vision-capable model for OCR and extraction

What is NOT sent:

Raw uploaded files in their original form (files are processed locally; extracted text is what reaches the AI)
Payment information, authentication credentials, or API keys
Data from other workspaces or other users' sessions
Historical sessions or stored field examples from other organizations

Which AI Models Are Used

Instafill uses OpenAI as its sole LLM provider, accessed both directly via the OpenAI API and through Microsoft Azure OpenAI Service.

OpenAI API — OpenAI's current-generation models are used for all text understanding, data extraction, and field mapping tasks. Instafill does not pin to specific model versions; instead, the platform is continuously updated to use the most capable models available as OpenAI releases new versions.

Microsoft Azure OpenAI Service — The same OpenAI models, hosted within Microsoft Azure infrastructure. Azure deployment is used for data residency and enterprise compliance requirements.

Infrastructure location: Instafill's primary servers are located in Texas, USA. For enterprise customers with regional data requirements, Instafill can deploy to any Azure region — including EU, Australia, Canada, and others — on request.

The specific model used for each task (filling, extraction, field grouping, etc.) is managed by Instafill's engineering team and updated as better models become available. This abstraction means users always benefit from the most capable current model without requiring any configuration.

No Training on Your Data

API access, not consumer products. Instafill calls OpenAI via their API (not ChatGPT or similar consumer interfaces). API-mode access operates under different data handling terms than consumer products.

OpenAI API terms (as of Instafill's agreement): Data submitted via the API is not used to train or improve OpenAI models. This is governed by OpenAI's API data usage policy — see OpenAI: Your data and privacy for full details. Enterprise customers can additionally sign a Data Processing Agreement (DPA) with OpenAI for additional guarantees.

Microsoft Azure OpenAI terms: Azure-hosted OpenAI models include additional enterprise data protection — Microsoft does not use customer data to train models, and data residency is within the Azure region configured for the deployment (Texas by default; other regions available on request).

Instafill's own systems also do not use your document content to train models. The AI models used are not fine-tuned on user data — the only per-form learning that occurs is the examples system (stored per-field in your own workspace, never shared across workspaces or used externally).

How Data Flows Through the AI Pipeline

Source upload: You upload files or type text. Files are stored encrypted in Azure Blob Storage. Text input is encrypted in MongoDB.
Text extraction: Before any AI call, source text is extracted from files locally (PDF parsing, OCR processing, Word parsing). The extracted text is stored encrypted.
Decryption for AI call: Immediately before an AI request, the relevant source text is decrypted in memory. The decrypted text exists only in the application's working memory — it is not written to disk or logged in decrypted form.
AI request: The decrypted text (source content) + form field context are sent to the AI provider via HTTPS. The request is made from Instafill's servers — your browser does not connect to OpenAI directly.
AI response: The AI returns structured JSON with field ID → value mappings. This response is processed, validated, and written to the session document as field values.
Post-request: No source document content is retained by the AI provider beyond the scope of the request (per API terms). Instafill stores the field values in your session; the source text that was decrypted for the AI call is not re-stored in decrypted form.

LangSmith Observability

Instafill uses LangSmith (by LangChain) for AI prompt tracing and observability. When LangSmith tracing is enabled, the following metadata is sent to LangSmith for each AI call:

User ID, organization ID, workspace ID (anonymized identifiers)
The prompt template name and version
Token counts and latency metrics
Model name and provider

LangSmith does not receive your document content in tracing metadata. The actual source text and field values sent in prompts may appear in LangSmith traces if full trace logging is enabled — this is used for debugging and prompt quality analysis by Instafill's engineering team, not exposed to third parties.

Vision Tasks and Image Data

Some AI tasks require sending images rather than text:

Form page screenshots: When the system needs to visually inspect a form page (for field detection, flat-to-fillable conversion, or filled example extraction), a screenshot of the form page is rendered and sent to a vision-capable model. These screenshots contain the form's field layout and any values already filled — no source document content.

Scanned document sources: When you upload a scanned image (PNG, JPG) or an image-based PDF, the image is sent to a vision model for OCR and text extraction. The extracted text is then used for filling — the image itself is not stored by the AI provider beyond the scope of that request.

Security & Privacy

Encrypted in transit: All AI API calls are made over HTTPS/TLS. No source content travels over unencrypted connections.
Workspace isolation: AI requests are made in the context of a specific workspace. The request contains only data from that workspace's current session — no cross-workspace data leaks into AI calls.
No persistent AI storage: AI providers do not retain API request data beyond the scope of serving the response, per their API terms.
Stateless mode: For maximum privacy, Stateless Mode deletes all source content immediately after the session completes — the AI processed it, the fill is done, and nothing persists.

Common Questions

Does OpenAI see my patient records / legal documents / financial data?

When those documents are uploaded as sources and autofill runs, yes — the text extracted from them is sent to the AI model to identify which values should go in which form fields. This is how the filling works.

What OpenAI does not do: retain that text after serving the response, use it to train models, or make it available to other users. API-mode data handling is governed by OpenAI's API data usage policy, not the consumer ChatGPT privacy policy.

For healthcare (HIPAA) use cases specifically: Instafill's architecture supports BAA-eligible deployments. Contact sales to discuss your specific compliance requirements. You can also use Stateless Mode to ensure zero post-session persistence.

Can I choose which AI provider processes my data?

Enterprise workspaces can configure which deployment is used for processing. Contact sales if you have a requirement to use Azure OpenAI specifically (for data residency reasons) or to process data within a particular Azure region.

By default, Instafill selects the model per task based on capability and performance. The model is managed internally and updated continuously — this is not exposed as a user-facing configuration in the standard interface.

What happens to the AI's output — the filled values?

Field values returned by the AI are stored in your session document in MongoDB, encrypted at rest. They are not sent back to the AI provider. They are not shared with other workspaces. They remain accessible to your workspace until the session is deleted per your retention policy.

If you download the filled form and delete the session, the field values are removed from Instafill's systems per your cleanup settings.

Is my data used to improve Instafill's product?

Instafill does not use your document content or filled field values to train AI models. The platform uses aggregated, anonymized usage metrics (via Amplitude) to understand product usage patterns — these are behavioral signals (session counts, feature usage frequency) not document content.

LangSmith traces used by Instafill's engineering team to monitor AI prompt quality contain metadata and token counts, not your document content.

How Instafill Uses AI Models & Handles Your Data

Overview

What Data Is Sent to AI Providers

Which AI Models Are Used

No Training on Your Data

How Data Flows Through the AI Pipeline

LangSmith Observability

Vision Tasks and Image Data

Security & Privacy

Common Questions

Related Features

Ready to get started?

How Instafill Uses AI Models & Handles Your Data

Overview

What Data Is Sent to AI Providers

Which AI Models Are Used

No Training on Your Data

How Data Flows Through the AI Pipeline

LangSmith Observability

Vision Tasks and Image Data

Security & Privacy

Common Questions

Related Features

Workspace Data Isolation & Multi-Tenant Security

Third-Party Subprocessors & Vendors

HIPAA, GDPR & SOC 2 Compliance

Data Encryption & Security

Autofill from Multiple Sources

Ready to get started?