How your data is handled

Where your file goes during a run: dedupe runs server-side in-process, the clean export is built in your browser, and what gets sent to the AI agent.

A fair question before you paste a real customer list into any tool: where does my file go? Here's the honest answer, end to end.

What happens during a run

Upload

Your CSV is sent to the backend over HTTPS as a normal file upload. It's parsed in memory and the matching runs in-process and synchronously — there's no background job queue and the dedupe path writes the file to no database table.

Match

The engine produces duplicate clusters from the rows it parsed. The result handed back to your browser is small: cluster row-index groups, a summary, and a capped sample — not a stored copy of your file.

Review

If you let the AI review run, the field values of the duplicate groups (and only those groups) are sent to the AI provider so it can judge each merge. See what the agent sees below.

Export

The cleaned file is rebuilt entirely in your browser from the rows you already have plus the cluster groups. The clean download never round-trips through a server export endpoint — it's assembled on your machine and saved locally.

The clean file stays in your hands

This is the part worth emphasizing: the deduped CSV is built client-side (buildCleanedCsv in the page), so the relief moment — the clean download — never requires the server to hold or return your data. The survivor selection (one row kept per group) and the CSV assembly both happen in the browser.

What the AI agent sees

The review agent needs to read the actual values to judge a merge, so the field values of the duplicate groups are included in the prompt sent to the AI provider. Two things bound this:

Only rows that landed in a duplicate group are sent — unique rows are never part of the review payload.
Values are truncated and the number of groups and members per group are capped before the prompt is built.

If you'd rather not send any values to the model, you can still use the engine's clusters and download a clean file; the agent review is an enhancement on top, not a requirement for the dedupe itself.

No overclaiming

This is a hosted tool. Your file is transmitted to and processed on the server for the matching step, and duplicate-group values are sent to an AI provider for the review step. We don't persist your file in the dedupe path, but "processed server-side" is not the same as "never leaves your machine." If your data can't leave your environment at all, this hosted demo isn't the right fit.

For account data deletion and the formal policy, see the privacy policy and data deletion pages.

Was this page helpful?

Edit this page on GitHub

PreviousAI merge review