Cleaning a spreadsheet
A practical walkthrough for the recurring real task: deduping a campaign list, an event-attendee export, or a board-deck account list before you ship it.
You've got a list due today and it's full of duplicates. This is the practical version of a clean run, for the lists people actually hand-dedupe every cycle: campaign sends, event-attendee merges, and QBR or board-deck account lists.
Before you start
Get your data into a CSV. Every spreadsheet tool exports one:
- Excel: File → Save As → CSV UTF-8.
- Google Sheets: File → Download → Comma-separated values (.csv).
- Your CRM: export the view or report you normally work from.
Keep the header row — the column names are how the engine works out which fields matter (names, emails, companies) and which to ignore (record ids, a source label). A file up to 1,000 rows runs on the free tier.
Walkthrough
Open the dedupe tool and drag your CSV in. The engine reads your columns and runs in a few seconds — no rules to configure.
Don't read all the groups. Go straight to the Needs review section at the top: the groups the agent rejected (it thinks the engine merged two different things) and the ones it marked uncertain. These are the only decisions that need your eyes.
Click a rejected group to expand its rows. Read the agent's one-line reason — for example, "two different people who share a surname and city." If you agree, the records are correctly left separate. If the agent got it wrong, you've found a real edge case in your data worth noting.
The confirmed groups under All groups are the bulk of the cleanup — the
Acme Inc / Acme, Inc. / ACME collapses. Skim a couple to build trust, then
move on.
Click Download clean CSV. One survivor row per group is kept (the first in original order), plus every non-duplicate row, with your original columns intact. Import that file into your campaign tool, event platform, or deck.
Tips for specific lists
The duplicate that gets you in trouble is the same person emailed twice. Make sure an email column is present — it's a strong signal the matcher and agent both lean on.
Registrations often merge two exports (the platform and a manual add). Names drift
(Bob vs Robert); fuzzy matching catches those where exact-match dedupe won't.
A double-counted account distorts the numbers. Spot-check the rejected groups so two genuinely separate accounts don't get collapsed before the number is reported.
Long-lived lists accumulate the same member entered years apart with different formatting. A clean pass each cycle keeps the count honest.
If it's the same list each time, Pro lets you save the config and re-run it next cycle instead of starting from scratch, and raises the cap to 100,000 rows per run.