AI Chat Moderation

Overview

XBans includes a built-in AI-powered chat moderation system based on a Naive Bayes classifier. It analyzes every chat message in real-time and classifies it as toxic or clean, automatically taking action on toxic messages.

The classifier comes pre-trained with datasets for five languages:

English (EN)
French (FR)
Spanish (ES)
Portuguese (PT)
German (DE)

This means it works out of the box without any manual training.

How It Works

The Bayesian classifier works by analyzing the probability that individual words and word combinations appear in toxic vs. clean messages. When a player sends a chat message:

The message is normalized (lowercased, special characters stripped, leet-speak decoded)
The message is tokenized into individual words and bigrams (word pairs)
Each token's probability of appearing in toxic vs. clean messages is looked up from the trained model
The probabilities are combined using Bayes' theorem to produce a toxicity score (0.0 to 1.0)
If the score exceeds the configured threshold, the message is flagged as toxic

The system also handles common evasion techniques like character substitution (e.g., "h3ll0" for "hello"), spacing tricks, and unicode lookalikes.

Escalating Actions

When a message is flagged, XBans doesn't just block it. It applies escalating actions based on how many violations the player has accumulated, configured per category (toxicity, spam, scam):

chat-ai:
  enabled: true
  min-confidence: 0.4
  monitored-categories:
    toxicity:
      reset-after: "1d"
      actions:
        1: "warn {player} {reason}"
        2: "mute {player} 10m {reason}"
        3: "mute {player} 1h {reason}"
        5: "mute {player} 1d {reason}"
        10: "ban {player} 7d {reason}"
    spam:
      reset-after: "6h"
      actions:
        1: "warn {player} {reason}"
        2: "mute {player} 5m {reason}"
        3: "mute {player} 30m {reason}"
        5: "mute {player} 3h {reason}"
    scam:
      reset-after: "0"
      actions:
        1: "warn {player} {reason}"
        2: "mute {player} 1h {reason}"
        3: "ban {player} 7d {reason}"

Actions are full XBans command strings with {player} and {reason} placeholders. {reason} resolves from lang.yml using the key chat-ai-reason-<category>-<threshold>, so messages can be customized and translated. The violation counter is per-player per-category and resets after the configured reset-after duration (use "0" to never reset).

Players with the xbans.bypass.chatai permission are exempt from AI moderation. Grant this to trusted staff or VIP players.

Custom Training Data

While the pre-trained models work well for general moderation, you can improve accuracy for your server's specific community by adding custom training data.

Adding training data

Training data files are located in plugins/XCore/addons/XBans/ai/. There are two files per language:

toxic_en.txt — Toxic messages (one per line)
clean_en.txt — Clean messages (one per line)

To add custom training data:

Open the appropriate toxic/clean file for your language.
Add your custom messages, one per line. Include real examples from your server's chat logs.
Run /xbans reload to retrain the model with the updated data.

The more training data you provide, the more accurate the classifier becomes. Aim for a balanced dataset with roughly equal numbers of toxic and clean examples. Server-specific slang and common phrases in your community improve detection significantly.

Avoid training on very short messages (1-2 words) as they lack context. Focus on full sentences and phrases that clearly represent toxic or clean speech.