Toxly API Documentation

Core concepts

Project

A workspace for one product, environment, or customer. Projects own API keys, policies, logs, usage, and limits.

API key

A secret token that authenticates API requests. Keys are stored hashed and can be revoked without deleting the project.

Policy

A list of category thresholds and actions. Policies decide whether content is allowed, warned, reviewed, blocked, masked, or escalated.

Provider

The classifier behind moderation. Toxly can use fast rules now and is prepared for Ollama, vLLM, and other model providers.

Quickstart

Create an account in the dashboard.
Create a project for your product or environment.
Create an API key and copy it once.
Send text to POST /v1/moderate/text.
Use decision, allowed, and risk_score in your app logic.

curl -X POST https://dashboard.toxly.net/v1/moderate/text \
  -H "Content-Type: application/json" \
  -H "X-Toxly-Key: txly_xxxxx" \
  -d '{"text":"message from your app"}'

Authentication

All public API requests require the X-Toxly-Key header. Never expose this key in frontend code. Call Toxly from your backend, serverless function, worker, or trusted infrastructure.

X-Toxly-Key: txly_xxxxxxxxx

HeaderX-Toxly-Key

Formattxly_...

StorageHashed at rest. Full key is only shown during creation.

ScopeOne key belongs to one project.

POST /v1/moderate/text

Moderates one text input and returns a structured policy decision. This is the primary endpoint for comments, chat messages, prompts, bios, support tickets, and other user-generated text.

POST https://dashboard.toxly.net/v1/moderate/text

Minimal request

{
  "text": "hello world"
}

Request with policy and metadata

{
  "text": "message from a user",
  "policy_id": 12,
  "metadata": {
    "source": "comment",
    "user_id": "usr_123",
    "locale": "en"
  }
}

Request fields

textRequired string. The content to classify. Keep it concise for best latency.

policy_idOptional integer. Uses the project's default policy when omitted.

metadataOptional object. Useful for your own context. Metadata is not required for scoring.

Toxly is designed for moderation, not text storage. Do not send secrets, tokens, passwords, or unnecessary personal data.

Response format

Every successful moderation response follows the same shape, so you can route decisions consistently inside your product.

{
  "request_id": "mod_xxx",
  "allowed": false,
  "decision": "block",
  "risk_score": 0.82,
  "categories": {
    "toxicity": 0.74,
    "hate": 0.82,
    "harassment": 0.12,
    "violence": 0.0,
    "self_harm": 0.0,
    "sexual": 0.0,
    "minor_safety": 0.0,
    "spam": 0.0,
    "scam": 0.0,
    "pii": 0.02,
    "jailbreak": 0.0,
    "prompt_injection": 0.0
  },
  "matched_rules": [
    {"category": "hate", "threshold": 0.70, "decision": "block"}
  ],
  "reason": "Policy threshold matched.",
  "latency_ms": 220
}

request_idUnique moderation request ID for logs and support.

allowedBoolean convenience field. Usually false for block, mask, and escalate.

decisionFinal policy action: allow, warn, review, block, mask, or escalate.

risk_scoreOverall risk score from 0.0 to 1.0.

categoriesPer-category scores from 0.0 to 1.0.

matched_rulesPolicy rules that triggered the final decision.

reasonShort explanation suitable for internal dashboards.

latency_msServer-side moderation latency in milliseconds.

Decisions

allowLow risk. Content can continue without interruption.

warnMild risk. You may show a warning or soft friction.

reviewUncertain or borderline. Queue for human review or apply delayed publishing.

blockClear policy violation. Prevent publishing or sending.

maskPotential sensitive data. Redact or avoid displaying the content.

escalateHigh-severity safety case, such as high self-harm risk. Escalate to a trained workflow.

Policies

Policies convert classifier scores into product decisions. A policy rule compares one category score against a threshold and applies an action when the threshold is exceeded.

{
  "rules": [
    {"category": "hate", "threshold": 0.70, "action": "block"},
    {"category": "self_harm", "threshold": 0.65, "action": "escalate"},
    {"category": "pii", "threshold": 0.80, "action": "mask"}
  ]
}

If multiple rules match, Toxly chooses the strongest relevant action based on the policy evaluation order.

Default policy

New projects start with a default policy designed for a strict MVP moderation flow.

toxicity > 0.75block

hate > 0.70block

harassment > 0.75block

violence > 0.80block

self_harm > 0.65escalate

sexual > 0.75block

minor_safety > 0.20block

spam > 0.80block

scam > 0.70block

pii > 0.80mask

jailbreak > 0.70block

prompt_injection > 0.70block

Integration examples

JavaScript fetch

const response = await fetch("https://dashboard.toxly.net/v1/moderate/text", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-Toxly-Key": process.env.TOXLY_API_KEY
  },
  body: JSON.stringify({ text: "message from your app" })
});

const result = await response.json();
if (!result.allowed) {
  // block, review, mask, or escalate in your product flow
}

Python requests

import os
import requests

response = requests.post(
    "https://dashboard.toxly.net/v1/moderate/text",
    headers={
        "Content-Type": "application/json",
        "X-Toxly-Key": os.environ["TOXLY_API_KEY"],
    },
    json={"text": "message from your app"},
    timeout=5,
)

result = response.json()
print(result["decision"], result["risk_score"])

Backend routing example

switch (result.decision) {
  case "allow":
    publishContent();
    break;
  case "warn":
    showUserWarning();
    break;
  case "review":
    sendToModerationQueue();
    break;
  case "block":
    rejectContent();
    break;
  case "mask":
    redactContent();
    break;
  case "escalate":
    triggerSafetyWorkflow();
    break;
}

Rate limits

Rate limits are enforced per API key and project plan. If the limit is exceeded, Toxly returns 429.

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1760000000

Free1,000 requests/month

Starter50,000 requests/month

Pro250,000 requests/month

Scale1,000,000 requests/month

BusinessCustom volume by agreement

Logs & privacy

Toxly stores moderation results for debugging and usage visibility, but avoids retaining the complete original text in moderation logs.

StoredRequest ID, project ID, decision, allowed, risk score, category scores, matched rules, reason, latency, text hash, and short preview.

Not storedFull original text in moderation logs.

PreviewShort text preview for operational debugging.

HashSHA-256 hash of the full text for deduplication and traceability without storing full content.

AI providers

Toxly uses a provider interface so moderation can move between fast rules, local models, and OpenAI-compatible endpoints.

Dummy provider

Fast rule-based checks for immediate local testing and predictable fallback behavior.

Ollama provider

Prepared for local model inference through Ollama. Useful for self-hosted or private deployments.

vLLM provider

Prepared for OpenAI-compatible chat completions served by vLLM or similar infrastructure.

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama-guard3:1b

VLLM_BASE_URL=http://localhost:8000/v1
VLLM_MODEL=toxly-guard
VLLM_API_KEY=local-key

Dashboard

The dashboard is available at dashboard.toxly.net. It is designed for project management and operational moderation workflows.

ProjectsAPI keysPolicies LogsUsageTest console Project settingsAccount settingsPlans

Errors

400Invalid JSON body or invalid field value.

401Missing, malformed, inactive, or invalid API key.

404Policy or project resource not found, or the caller does not own it.

413Request body is too large.

429Rate limit or monthly project limit exceeded.

500Unexpected server error. Retry with backoff and inspect dashboard logs.

Best practices

Call Toxly from your backend, not directly from a browser.
Use separate projects for production, staging, and experiments.
Start strict for high-risk surfaces like public chat and profiles.
Route review to a moderation queue instead of silently allowing it.
Escalate high self-harm risk into a human-reviewed safety workflow.
Do not log API keys, full user text, passwords, or secrets.
Use timeouts and fallback behavior in your own application.

Roadmap

The current API focuses on text moderation. Image moderation is planned with the same response shape so product logic can stay consistent.

POST /v1/moderate/image

multipart/form-data:
  image: file
  text: optional string

Future image providers may include Llama Guard vision models, Llama 3.2 Vision, Qwen2.5-VL, or other local multimodal safety models.

Toxly Content Safety API