API Documentation

Toxly Content Safety API

Classify user-generated content with structured decisions, risk scores, policy matches, category scores, and privacy-conscious moderation logs.

Base URL for API calls: https://dashboard.toxly.net

Core concepts

Project

A workspace for one product, environment, or customer. Projects own API keys, policies, logs, usage, and limits.

API key

A secret token that authenticates API requests. Keys are stored hashed and can be revoked without deleting the project.

Policy

A list of category thresholds and actions. Policies decide whether content is allowed, warned, reviewed, blocked, masked, or escalated.

Provider

The classifier behind moderation. Toxly can use fast rules now and is prepared for Ollama, vLLM, and other model providers.

Quickstart

  1. Create an account in the dashboard.
  2. Create a project for your product or environment.
  3. Create an API key and copy it once.
  4. Send text to POST /v1/moderate/text.
  5. Use decision, allowed, and risk_score in your app logic.
curl -X POST https://dashboard.toxly.net/v1/moderate/text \
  -H "Content-Type: application/json" \
  -H "X-Toxly-Key: txly_xxxxx" \
  -d '{"text":"message from your app"}'

Authentication

All public API requests require the X-Toxly-Key header. Never expose this key in frontend code. Call Toxly from your backend, serverless function, worker, or trusted infrastructure.

X-Toxly-Key: txly_xxxxxxxxx
HeaderX-Toxly-Key
Formattxly_...
StorageHashed at rest. Full key is only shown during creation.
ScopeOne key belongs to one project.

POST /v1/moderate/text

Moderates one text input and returns a structured policy decision. This is the primary endpoint for comments, chat messages, prompts, bios, support tickets, and other user-generated text.

POST https://dashboard.toxly.net/v1/moderate/text

Minimal request

{
  "text": "hello world"
}

Request with policy and metadata

{
  "text": "message from a user",
  "policy_id": 12,
  "metadata": {
    "source": "comment",
    "user_id": "usr_123",
    "locale": "en"
  }
}

Request fields

textRequired string. The content to classify. Keep it concise for best latency.
policy_idOptional integer. Uses the project's default policy when omitted.
metadataOptional object. Useful for your own context. Metadata is not required for scoring.
Toxly is designed for moderation, not text storage. Do not send secrets, tokens, passwords, or unnecessary personal data.

Response format

Every successful moderation response follows the same shape, so you can route decisions consistently inside your product.

{
  "request_id": "mod_xxx",
  "allowed": false,
  "decision": "block",
  "risk_score": 0.82,
  "categories": {
    "toxicity": 0.74,
    "hate": 0.82,
    "harassment": 0.12,
    "violence": 0.0,
    "self_harm": 0.0,
    "sexual": 0.0,
    "minor_safety": 0.0,
    "spam": 0.0,
    "scam": 0.0,
    "pii": 0.02,
    "jailbreak": 0.0,
    "prompt_injection": 0.0
  },
  "matched_rules": [
    {"category": "hate", "threshold": 0.70, "decision": "block"}
  ],
  "reason": "Policy threshold matched.",
  "latency_ms": 220
}
request_idUnique moderation request ID for logs and support.
allowedBoolean convenience field. Usually false for block, mask, and escalate.
decisionFinal policy action: allow, warn, review, block, mask, or escalate.
risk_scoreOverall risk score from 0.0 to 1.0.
categoriesPer-category scores from 0.0 to 1.0.
matched_rulesPolicy rules that triggered the final decision.
reasonShort explanation suitable for internal dashboards.
latency_msServer-side moderation latency in milliseconds.

Decisions

allowLow risk. Content can continue without interruption.
warnMild risk. You may show a warning or soft friction.
reviewUncertain or borderline. Queue for human review or apply delayed publishing.
blockClear policy violation. Prevent publishing or sending.
maskPotential sensitive data. Redact or avoid displaying the content.
escalateHigh-severity safety case, such as high self-harm risk. Escalate to a trained workflow.

Categories

Scores are floats between 0.0 and 1.0. Higher values mean higher confidence or higher risk for that category.

toxicityhateharassment violenceself_harmsexual minor_safetyspamscam piijailbreakprompt_injection
toxicityGeneral abuse, insults, aggressive language, or hostile tone.
hateAttacks or slurs targeting protected classes or identity groups.
harassmentBullying, threats, repeated targeting, intimidation, or degrading behavior.
violenceViolent threats, instructions, celebration, or graphic violent content.
self_harmSelf-harm intent, encouragement, crisis signals, or unsafe instructions.
sexualSexual content, explicit requests, or adult material.
minor_safetyAny sexual or exploitative context involving minors. This should be treated with very low tolerance.
spamMass posting, repetitive promotion, low-quality abuse, or bot-like content.
scamFraud, phishing, impersonation, suspicious offers, or manipulation.
piiPersonal data such as emails, phone numbers, addresses, IDs, or secrets.
jailbreakAttempts to bypass model safety rules or override system instructions.
prompt_injectionAttempts to hijack an AI app, reveal hidden prompts, or manipulate tool behavior.

Policies

Policies convert classifier scores into product decisions. A policy rule compares one category score against a threshold and applies an action when the threshold is exceeded.

{
  "rules": [
    {"category": "hate", "threshold": 0.70, "action": "block"},
    {"category": "self_harm", "threshold": 0.65, "action": "escalate"},
    {"category": "pii", "threshold": 0.80, "action": "mask"}
  ]
}

If multiple rules match, Toxly chooses the strongest relevant action based on the policy evaluation order.

Default policy

New projects start with a default policy designed for a strict MVP moderation flow.

toxicity > 0.75block
hate > 0.70block
harassment > 0.75block
violence > 0.80block
self_harm > 0.65escalate
sexual > 0.75block
minor_safety > 0.20block
spam > 0.80block
scam > 0.70block
pii > 0.80mask
jailbreak > 0.70block
prompt_injection > 0.70block

Integration examples

JavaScript fetch

const response = await fetch("https://dashboard.toxly.net/v1/moderate/text", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-Toxly-Key": process.env.TOXLY_API_KEY
  },
  body: JSON.stringify({ text: "message from your app" })
});

const result = await response.json();
if (!result.allowed) {
  // block, review, mask, or escalate in your product flow
}

Python requests

import os
import requests

response = requests.post(
    "https://dashboard.toxly.net/v1/moderate/text",
    headers={
        "Content-Type": "application/json",
        "X-Toxly-Key": os.environ["TOXLY_API_KEY"],
    },
    json={"text": "message from your app"},
    timeout=5,
)

result = response.json()
print(result["decision"], result["risk_score"])

Backend routing example

switch (result.decision) {
  case "allow":
    publishContent();
    break;
  case "warn":
    showUserWarning();
    break;
  case "review":
    sendToModerationQueue();
    break;
  case "block":
    rejectContent();
    break;
  case "mask":
    redactContent();
    break;
  case "escalate":
    triggerSafetyWorkflow();
    break;
}

Rate limits

Rate limits are enforced per API key and project plan. If the limit is exceeded, Toxly returns 429.

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1760000000
Free1,000 requests/month
Starter50,000 requests/month
Pro250,000 requests/month
Scale1,000,000 requests/month
BusinessCustom volume by agreement

Logs & privacy

Toxly stores moderation results for debugging and usage visibility, but avoids retaining the complete original text in moderation logs.

StoredRequest ID, project ID, decision, allowed, risk score, category scores, matched rules, reason, latency, text hash, and short preview.
Not storedFull original text in moderation logs.
PreviewShort text preview for operational debugging.
HashSHA-256 hash of the full text for deduplication and traceability without storing full content.

AI providers

Toxly uses a provider interface so moderation can move between fast rules, local models, and OpenAI-compatible endpoints.

Dummy provider

Fast rule-based checks for immediate local testing and predictable fallback behavior.

Ollama provider

Prepared for local model inference through Ollama. Useful for self-hosted or private deployments.

vLLM provider

Prepared for OpenAI-compatible chat completions served by vLLM or similar infrastructure.

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama-guard3:1b

VLLM_BASE_URL=http://localhost:8000/v1
VLLM_MODEL=toxly-guard
VLLM_API_KEY=local-key

Dashboard

The dashboard is available at dashboard.toxly.net. It is designed for project management and operational moderation workflows.

ProjectsAPI keysPolicies LogsUsageTest console Project settingsAccount settingsPlans

Errors

400Invalid JSON body or invalid field value.
401Missing, malformed, inactive, or invalid API key.
404Policy or project resource not found, or the caller does not own it.
413Request body is too large.
429Rate limit or monthly project limit exceeded.
500Unexpected server error. Retry with backoff and inspect dashboard logs.

Best practices

  • Call Toxly from your backend, not directly from a browser.
  • Use separate projects for production, staging, and experiments.
  • Start strict for high-risk surfaces like public chat and profiles.
  • Route review to a moderation queue instead of silently allowing it.
  • Escalate high self-harm risk into a human-reviewed safety workflow.
  • Do not log API keys, full user text, passwords, or secrets.
  • Use timeouts and fallback behavior in your own application.

Roadmap

The current API focuses on text moderation. Image moderation is planned with the same response shape so product logic can stay consistent.

POST /v1/moderate/image

multipart/form-data:
  image: file
  text: optional string
Future image providers may include Llama Guard vision models, Llama 3.2 Vision, Qwen2.5-VL, or other local multimodal safety models.