
When multiple content policies can flag the same item, it’s hard to know whether a violation is minor (e.g., profanity) or critical (e.g., illicit behavior) without digging through each policy.
To solve this we've built a top level model to output a single score to give you an instant idea about the severity of a violation.
Severity score solves this by providing a single, opinionated score across all your policies. It weights violations by impact so you can see urgency at a glance.
For example: illicit content ranks above hate, which ranks above swearing.
Use severity score to:
Sort review queues by severity.
Set thresholds to automate actions (e.g., block above 90%, review above 50%).
Understand historical impact on your dashboard, including how many items would have been blocked. Note: historical data is available if you’ve used an API endpoint that supports severity score for the past 30 days.
Implementation:
Read the score at evaluation.severity_score.
Read the recommended action at recommendation.action.