Explain endpoint

POST /api/dedupe/explain — have the AI agent review duplicate clusters and return a per-cluster verdict, confidence, and plain-language explanation. Never 5xx on model failure.

The explain endpoint runs the AI review agent over a set of duplicate clusters and returns, per cluster, whether the merge holds up — a verdict, a confidence, and a one-sentence reason. It's stateless: you send the clusters' field values, it returns the verdicts.

POST/api/dedupe/explainoptional auth

Review duplicate clusters; get per-cluster verdicts back.

Authentication

Optional, same as the dedupe endpoint. Send a Clerk bearer token if you have one; anonymous calls are allowed (it's part of the public demo).

Request

A JSON body with a clusters array. Each cluster carries an integer cluster_id and its members — the actual field-value dicts for the rows in the group (not row indices).

{
  "clusters": [
    {
      "cluster_id": 0,
      "members": [
        { "name": "Acme Inc", "email": "info@acme.com" },
        { "name": "Acme, Inc.", "email": "Info@Acme.com" }
      ]
    }
  ]
}
FieldTypeNotes
clustersarrayThe duplicate groups to review.
clusters[].cluster_idintThe id to echo back on the verdict.
clusters[].membersobject[]Each member is a {column: value} dict of that row's values.

Limits.

  • At most 500 clusters per request; more returns 413.
  • Each cluster's members are truncated to the first 100 before review.
  • Only clusters with 2 or more members are reviewable; singletons are ignored.
  • At most EXPLAIN_MAX_CLUSTERS reviewable clusters (default 50) are sent to the model; any beyond that are returned marked not reviewed.
  • Rate limit: 30 requests per hour.

Response — 200

{
  "verdicts": [
    {
      "cluster_id": 0,
      "verdict": "confirmed",
      "confidence": "high",
      "explanation": "Same company — only punctuation and casing differ.",
      "reviewed": true
    }
  ],
  "reviewed_count": 1,
  "total_clusters": 1,
  "model_available": true
}
FieldTypeMeaning
verdicts[].cluster_idintThe cluster this verdict is for.
verdicts[].verdictstringconfirmed, uncertain, or rejected.
verdicts[].confidencestringhigh, medium, or low.
verdicts[].explanationstringOne plain-language sentence on why the rows are (or aren't) the same entity.
verdicts[].reviewedbooltrue if the agent actually assessed this group; false if it was over the cap or degraded.
reviewed_countintHow many clusters were sent to the model.
total_clustersintHow many reviewable (≥2-member) clusters were supplied.
model_availableboolfalse when the AI provider was unavailable and verdicts are degraded.

Verdict values

VerdictMeaning
confirmedThe rows are the same real-world entity; the merge holds.
uncertainEvidence is thin or mixed; flagged for a human.
rejectedThe engine over-merged distinct entities — the agent overrides it.

Errors and degradation

StatusWhen
413More than 500 clusters in the request.

This endpoint never returns 5xx for a model failure. If the AI provider is missing, errors, or returns unparseable output, it sets model_available: false and degrades each affected cluster to a neutral uncertain / low verdict with an "explanation unavailable" message, rather than failing the request. Clusters over the review cap come back as uncertain / low with reviewed: false and a "not reviewed" message.

Was this page helpful?
Edit this page on GitHub