Explain endpoint

POST /api/dedupe/explain — have the AI agent review duplicate clusters and return a per-cluster verdict, confidence, and plain-language explanation. Never 5xx on model failure.

The explain endpoint runs the AI review agent over a set of duplicate clusters and returns, per cluster, whether the merge holds up — a verdict, a confidence, and a one-sentence reason. It's stateless: you send the clusters' field values, it returns the verdicts.

Review duplicate clusters; get per-cluster verdicts back.

Authentication

Optional, same as the dedupe endpoint. Send a Clerk bearer token if you have one; anonymous calls are allowed (it's part of the public demo).

Request

A JSON body with a clusters array. Each cluster carries an integer cluster_id and its members — the actual field-value dicts for the rows in the group (not row indices).

{
  "clusters": [
    {
      "cluster_id": 0,
      "members": [
        { "name": "Acme Inc", "email": "info@acme.com" },
        { "name": "Acme, Inc.", "email": "Info@Acme.com" }
      ]
    }
  ]
}

Field	Type	Notes
`clusters`	array	The duplicate groups to review.
`clusters[].cluster_id`	int	The id to echo back on the verdict.
`clusters[].members`	object[]	Each member is a `{column: value}` dict of that row's values.

Limits.

At most 500 clusters per request; more returns 413.
Each cluster's members are truncated to the first 100 before review.
Only clusters with 2 or more members are reviewable; singletons are ignored.
At most EXPLAIN_MAX_CLUSTERS reviewable clusters (default 50) are sent to the model; any beyond that are returned marked not reviewed.
Rate limit: 30 requests per hour.

Response — 200

{
  "verdicts": [
    {
      "cluster_id": 0,
      "verdict": "confirmed",
      "confidence": "high",
      "explanation": "Same company — only punctuation and casing differ.",
      "reviewed": true
    }
  ],
  "reviewed_count": 1,
  "total_clusters": 1,
  "model_available": true
}

Field	Type	Meaning
`verdicts[].cluster_id`	int	The cluster this verdict is for.
`verdicts[].verdict`	string	`confirmed`, `uncertain`, or `rejected`.
`verdicts[].confidence`	string	`high`, `medium`, or `low`.
`verdicts[].explanation`	string	One plain-language sentence on why the rows are (or aren't) the same entity.
`verdicts[].reviewed`	bool	`true` if the agent actually assessed this group; `false` if it was over the cap or degraded.
`reviewed_count`	int	How many clusters were sent to the model.
`total_clusters`	int	How many reviewable (≥2-member) clusters were supplied.
`model_available`	bool	`false` when the AI provider was unavailable and verdicts are degraded.

Verdict values

Verdict	Meaning
`confirmed`	The rows are the same real-world entity; the merge holds.
`uncertain`	Evidence is thin or mixed; flagged for a human.
`rejected`	The engine over-merged distinct entities — the agent overrides it.

Errors and degradation

Status	When
`413`	More than 500 clusters in the request.

This endpoint never returns 5xx for a model failure. If the AI provider is missing, errors, or returns unparseable output, it sets model_available: false and degrades each affected cluster to a neutral uncertain / low verdict with an "explanation unavailable" message, rather than failing the request. Clusters over the review cap come back as uncertain / low with reviewed: false and a "not reviewed" message.

Was this page helpful?

Edit this page on GitHub

PreviousDedupe endpoint