Explain endpoint
POST /api/dedupe/explain — have the AI agent review duplicate clusters and return a per-cluster verdict, confidence, and plain-language explanation. Never 5xx on model failure.
The explain endpoint runs the AI review agent over a set of duplicate clusters and returns, per cluster, whether the merge holds up — a verdict, a confidence, and a one-sentence reason. It's stateless: you send the clusters' field values, it returns the verdicts.
Review duplicate clusters; get per-cluster verdicts back.
Authentication
Optional, same as the dedupe endpoint. Send a Clerk bearer token if you have one; anonymous calls are allowed (it's part of the public demo).
Request
A JSON body with a clusters array. Each cluster carries an integer cluster_id
and its members — the actual field-value dicts for the rows in the group (not row
indices).
{
"clusters": [
{
"cluster_id": 0,
"members": [
{ "name": "Acme Inc", "email": "info@acme.com" },
{ "name": "Acme, Inc.", "email": "Info@Acme.com" }
]
}
]
}
| Field | Type | Notes |
|---|---|---|
clusters | array | The duplicate groups to review. |
clusters[].cluster_id | int | The id to echo back on the verdict. |
clusters[].members | object[] | Each member is a {column: value} dict of that row's values. |
Limits.
- At most 500 clusters per request; more returns
413. - Each cluster's members are truncated to the first 100 before review.
- Only clusters with 2 or more members are reviewable; singletons are ignored.
- At most
EXPLAIN_MAX_CLUSTERSreviewable clusters (default 50) are sent to the model; any beyond that are returned marked not reviewed. - Rate limit: 30 requests per hour.
Response — 200
{
"verdicts": [
{
"cluster_id": 0,
"verdict": "confirmed",
"confidence": "high",
"explanation": "Same company — only punctuation and casing differ.",
"reviewed": true
}
],
"reviewed_count": 1,
"total_clusters": 1,
"model_available": true
}
| Field | Type | Meaning |
|---|---|---|
verdicts[].cluster_id | int | The cluster this verdict is for. |
verdicts[].verdict | string | confirmed, uncertain, or rejected. |
verdicts[].confidence | string | high, medium, or low. |
verdicts[].explanation | string | One plain-language sentence on why the rows are (or aren't) the same entity. |
verdicts[].reviewed | bool | true if the agent actually assessed this group; false if it was over the cap or degraded. |
reviewed_count | int | How many clusters were sent to the model. |
total_clusters | int | How many reviewable (≥2-member) clusters were supplied. |
model_available | bool | false when the AI provider was unavailable and verdicts are degraded. |
Verdict values
| Verdict | Meaning |
|---|---|
confirmed | The rows are the same real-world entity; the merge holds. |
uncertain | Evidence is thin or mixed; flagged for a human. |
rejected | The engine over-merged distinct entities — the agent overrides it. |
Errors and degradation
| Status | When |
|---|---|
413 | More than 500 clusters in the request. |
This endpoint never returns 5xx for a model failure. If the AI provider is
missing, errors, or returns unparseable output, it sets model_available: false
and degrades each affected cluster to a neutral uncertain / low verdict with an
"explanation unavailable" message, rather than failing the request. Clusters over
the review cap come back as uncertain / low with reviewed: false and a "not
reviewed" message.