Safety Check Operator
The Safety Check operator is designed to analyze text-based content for potentially harmful or inappropriate material. It integrates toxicity detection and profanity filtering to help ensure that generated or user-submitted text adheres to safety guidelines. This operator is beneficial for applications where content moderation is crucial, such as chatbots, social media platforms, or any system involving user-generated text.
Requirements
Section titled “Requirements”- Python Packages:
detoxifybetter_profanitytransformers(optional, for transformer-based toxicity detection) These can be installed via the ChatTD operator’s Python manager.
- ChatTD Operator: Required and must be configured.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Input Table (DAT): Table containing the conversation/text to analyze. Required columns:
id,role,message,timestamp.
Outputs
Section titled “Outputs”- Toxicity Table (DAT): Toxicity scores and details. Columns:
toxicity_score,severe_toxicity,obscene,threat,insult,identity_hate,message_id,role,message,timestamp. - Profanity Table (DAT): Profanity detection results. Columns:
contains_profanity,profanity_probability,flagged_words,message_id,role,message,timestamp. - PII Table (DAT): Personally Identifiable Information results. Columns:
contains_pii,pii_types,confidence,message_id,role,message,timestamp. - Summary Table (DAT): Overall safety analysis summary. Columns:
metric,value.
Parameters
Section titled “Parameters”Safety Page
Section titled “Safety Page” Start Safety Checks (Check)
op('safety_check').par.Check Pulse - Default:
None
Status (Status)
op('safety_check').par.Status String - Default:
None
Toxicity Threshold (Toxicitythreshold)
op('safety_check').par.Toxicitythreshold Float - Default:
0- Range:
- 0 to 1
Profanity Threshold (Profanitythreshold)
op('safety_check').par.Profanitythreshold Float - Default:
0- Range:
- 0 to 1
Clear Results (Clear)
op('safety_check').par.Clear Pulse - Default:
None
Callbacks Page
Section titled “Callbacks Page” Callbacks Header
Callback DAT (Callbackdat)
op('safety_check').par.Callbackdat DAT - Default:
ChatTD_callbacks
Edit Callbacks (Editcallbacksscript)
op('safety_check').par.Editcallbacksscript Pulse - Default:
None
Create Callbacks (Createpulse)
op('safety_check').par.Createpulse Pulse - Default:
None
onViolation (Onviolation)
op('safety_check').par.Onviolation Toggle - Default:
Off
About Page
Section titled “About Page” Bypass (Bypass)
op('safety_check').par.Bypass Toggle - Default:
Off
Show Built-in Parameters (Showbuiltin)
op('safety_check').par.Showbuiltin Toggle - Default:
Off
Version (Version)
op('safety_check').par.Version String - Default:
None
Last Updated (Lastupdated)
op('safety_check').par.Lastupdated String - Default:
None
Creator (Creator)
op('safety_check').par.Creator String - Default:
None
Website (Website)
op('safety_check').par.Website String - Default:
None
ChatTD Operator (Chattd)
op('safety_check').par.Chattd OP - Default:
None
Callbacks
Section titled “Callbacks” Available Callbacks:
onViolation
Example Callback Structure:
def onViolation(info):
# Called when a safety check fails (e.g., toxicity/profanity threshold exceeded)
# info dictionary contains details like:
# - op: The Safety Check operator
# - checkType: 'toxicity' or 'profanity'
# - messageId: ID of the violating message
# - message: Content of the violating message
# - role: Role associated with the message
# - score: The calculated score (toxicity or profanity probability)
# - threshold: The threshold that was exceeded
print(f"Safety violation detected: {info.get('checkType')}")
# Example: op('path/to/notifier').par.Sendmessage.pulse()
pass Performance Considerations
Section titled “Performance Considerations”- Performance depends on input text size and enabled checks.
- Transformer-based toxicity detection can be resource-intensive.
- Analyze only necessary parts of conversations (e.g.,
last_message) for better performance. batchupdate mode might be faster for large inputs.
Usage Examples
Section titled “Usage Examples”Analyzing a Full Conversation
Section titled “Analyzing a Full Conversation”- Create a
safety_checkLOP. - Create a Table DAT with your conversation data (columns:
id,role,message,timestamp) and connect it to the input of thesafety_checkLOP. - Set
Analyze ModetoFull Conversation. - Select the desired
Safety Checks(e.g.,Toxicity DetectionandProfanity Filtering). - Adjust
Toxicity ThresholdandProfanity Thresholdas needed. - Pulse
Start Safety Checks. - View the results in the
Toxicity TableandProfanity Tableoutput DATs.
Using Callbacks for Violations
Section titled “Using Callbacks for Violations”- Create a
safety_checkLOP. - Create a Text DAT (e.g.,
safety_callbacks) and add theonViolationfunction (see Callbacks section above). - On the
safety_checkLOP, set theCallback DATparameter to yoursafety_callbacksDAT. - Enable the
onViolationtoggle. - Run safety checks as usual. The
onViolationfunction in yoursafety_callbacksDAT will execute if thresholds are met.
Common Use Cases
Section titled “Common Use Cases”- Moderating chatbots and virtual assistants.
- Filtering user-generated content (comments, posts).
- Ensuring safety in text-based games or virtual worlds.
- Flagging inappropriate language in online communities.