Cleaning a spreadsheet

A practical walkthrough for the recurring real task: deduping a campaign list, an event-attendee export, or a board-deck account list before you ship it.

You've got a list due today and it's full of duplicates. This is the practical version of a clean run, for the lists people actually hand-dedupe every cycle: campaign sends, event-attendee merges, and QBR or board-deck account lists.

Before you start

Get your data into a CSV. Every spreadsheet tool exports one:

  • Excel: File → Save As → CSV UTF-8.
  • Google Sheets: File → Download → Comma-separated values (.csv).
  • Your CRM: export the view or report you normally work from.

Keep the header row — the column names are how the engine works out which fields matter (names, emails, companies) and which to ignore (record ids, a source label). A file up to 1,000 rows runs on the free tier.

Walkthrough

Drop the file

Open the dedupe tool and drag your CSV in. The engine reads your columns and runs in a few seconds — no rules to configure.

Start with Needs review

Don't read all the groups. Go straight to the Needs review section at the top: the groups the agent rejected (it thinks the engine merged two different things) and the ones it marked uncertain. These are the only decisions that need your eyes.

Spot-check a rejection

Click a rejected group to expand its rows. Read the agent's one-line reason — for example, "two different people who share a surname and city." If you agree, the records are correctly left separate. If the agent got it wrong, you've found a real edge case in your data worth noting.

Glance at the confirmed groups

The confirmed groups under All groups are the bulk of the cleanup — the Acme Inc / Acme, Inc. / ACME collapses. Skim a couple to build trust, then move on.

Download and ship

Click Download clean CSV. One survivor row per group is kept (the first in original order), plus every non-duplicate row, with your original columns intact. Import that file into your campaign tool, event platform, or deck.

Tips for specific lists

Campaign / email lists

The duplicate that gets you in trouble is the same person emailed twice. Make sure an email column is present — it's a strong signal the matcher and agent both lean on.

Event attendees

Registrations often merge two exports (the platform and a manual add). Names drift (Bob vs Robert); fuzzy matching catches those where exact-match dedupe won't.

Board / QBR account decks

A double-counted account distorts the numbers. Spot-check the rejected groups so two genuinely separate accounts don't get collapsed before the number is reported.

Membership rolls

Long-lived lists accumulate the same member entered years apart with different formatting. A clean pass each cycle keeps the count honest.

Doing this every cycle?

If it's the same list each time, Pro lets you save the config and re-run it next cycle instead of starting from scratch, and raises the cap to 100,000 rows per run.

Was this page helpful?
Edit this page on GitHub