Safety Check Operator

v1.0.0

The Safety Check operator is designed to analyze text-based content for potentially harmful or inappropriate material. It integrates toxicity detection and profanity filtering to help ensure that generated or user-submitted text adheres to safety guidelines. This operator is beneficial for applications where content moderation is crucial, such as chatbots, social media platforms, or any system involving user-generated text.

Requirements

Python Packages:
- detoxify
- better_profanity
- transformers (optional, for transformer-based toxicity detection) These can be installed via the ChatTD operator’s Python manager.
ChatTD Operator: Required and must be configured.

Input/Output

Inputs

Input Table (DAT): Table containing the conversation/text to analyze. Required columns: id, role, message, timestamp.

Outputs

Toxicity Table (DAT): Toxicity scores and details. Columns: toxicity_score, severe_toxicity, obscene, threat, insult, identity_hate, message_id, role, message, timestamp.
Profanity Table (DAT): Profanity detection results. Columns: contains_profanity, profanity_probability, flagged_words, message_id, role, message, timestamp.
PII Table (DAT): Personally Identifiable Information results. Columns: contains_pii, pii_types, confidence, message_id, role, message, timestamp.
Summary Table (DAT): Overall safety analysis summary. Columns: metric, value.

Parameters

Safety Page

Start Safety Checks (Check) op('safety_check').par.Check Pulse

Default:: None

Status (Status) op('safety_check').par.Status String

Default:: None

Toxicity Threshold (Toxicitythreshold) op('safety_check').par.Toxicitythreshold Float

Default:: 0
Range:: 0 to 1

Profanity Threshold (Profanitythreshold) op('safety_check').par.Profanitythreshold Float

Default:: 0
Range:: 0 to 1

Clear Results (Clear) op('safety_check').par.Clear Pulse

Default:: None

Callbacks Page

Callbacks Header

Callback DAT (Callbackdat) op('safety_check').par.Callbackdat DAT

Default:: ChatTD_callbacks

Edit Callbacks (Editcallbacksscript) op('safety_check').par.Editcallbacksscript Pulse

Default:: None

Create Callbacks (Createpulse) op('safety_check').par.Createpulse Pulse

Default:: None

onViolation (Onviolation) op('safety_check').par.Onviolation Toggle

Default:: Off

About Page

Bypass (Bypass) op('safety_check').par.Bypass Toggle

Default:: Off

Show Built-in Parameters (Showbuiltin) op('safety_check').par.Showbuiltin Toggle

Default:: Off

Version (Version) op('safety_check').par.Version String

Default:: None

Last Updated (Lastupdated) op('safety_check').par.Lastupdated String

Default:: None

Creator (Creator) op('safety_check').par.Creator String

Default:: None

Website (Website) op('safety_check').par.Website String

Default:: None

ChatTD Operator (Chattd) op('safety_check').par.Chattd OP

Default:: None

Callbacks

Available Callbacks:

onViolation

Example Callback Structure:

def onViolation(info):
# Called when a safety check fails (e.g., toxicity/profanity threshold exceeded)
# info dictionary contains details like:
# - op: The Safety Check operator
# - checkType: 'toxicity' or 'profanity'
# - messageId: ID of the violating message
# - message: Content of the violating message
# - role: Role associated with the message
# - score: The calculated score (toxicity or profanity probability)
# - threshold: The threshold that was exceeded
print(f"Safety violation detected: {info.get('checkType')}")
# Example: op('path/to/notifier').par.Sendmessage.pulse()
pass

Performance Considerations

Performance depends on input text size and enabled checks.
Transformer-based toxicity detection can be resource-intensive.
Analyze only necessary parts of conversations (e.g., last_message) for better performance.
batch update mode might be faster for large inputs.

Usage Examples

Analyzing a Full Conversation

Create a safety_check LOP.
Create a Table DAT with your conversation data (columns: id, role, message, timestamp) and connect it to the input of the safety_check LOP.
Set Analyze Mode to Full Conversation.
Select the desired Safety Checks (e.g., Toxicity Detection and Profanity Filtering).
Adjust Toxicity Threshold and Profanity Threshold as needed.
Pulse Start Safety Checks.
View the results in the Toxicity Table and Profanity Table output DATs.

Using Callbacks for Violations

Create a safety_check LOP.
Create a Text DAT (e.g., safety_callbacks) and add the onViolation function (see Callbacks section above).
On the safety_check LOP, set the Callback DAT parameter to your safety_callbacks DAT.
Enable the onViolation toggle.
Run safety checks as usual. The onViolation function in your safety_callbacks DAT will execute if thresholds are met.

Common Use Cases

Moderating chatbots and virtual assistants.
Filtering user-generated content (comments, posts).
Ensuring safety in text-based games or virtual worlds.
Flagging inappropriate language in online communities.