Moderation API changelog

Severity score for content

When multiple content policies can flag the same item, it’s hard to know whether a violation is minor (e.g., profanity) or critical (e.g., illicit behavior) without digging through each policy.

To solve this we've built a top level model to output a single score to give you an instant idea about the severity of a violation.

Severity score solves this by providing a single, opinionated score across all your policies. It weights violations by impact so you can see urgency at a glance.

For example: illicit content ranks above hate, which ranks above swearing.

Use severity score to:

Sort review queues by severity.
Set thresholds to automate actions (e.g., block above 90%, review above 50%).
Understand historical impact on your dashboard, including how many items would have been blocked. Note: historical data is available if you’ve used an API endpoint that supports severity score for the past 30 days.

Implementation:

Read the score at ‎evaluation.severity_score.
Read the recommended action at ‎recommendation.action.
See the docs for example and integration details

Severity score for content - Moderation API Changelog | Productlane