A production API tuned for APT analysis, reverse engineering, vulnerability triage, and Web3 audits. Domain-fine-tuned on top of an unrestricted base model with a security-grade RAG corpus.
https://api.ai0day.com
· Anthropic Messages API compatible
· OpenAI Chat Completions compatible
Each mode invokes a dedicated retrieval pipeline (BGE-M3 + BM25 hybrid over a curated security corpus) before generation. The model is a domain-tuned LoRA on top of GLM-5.1, served on H200 GPUs with FP8 quantization and 16K context.
mode: apt_detectionThreat-actor TTPs, MITRE ATT&CK mapping, kill-chain reconstruction, lateral-movement detection, LOLBin hunting, C2 traffic patterns.
mode: reverse_analysisDisassembly walkthroughs, packing identification, anti-debug bypasses, decompilation guidance, malware family attribution, custom obfuscator analysis.
mode: vuln_triageCVE analysis, CVSS breakdown, exploit conditions, PoC outlines, kernel and userspace memory-corruption review, mitigation chains.
mode: web3_auditSolidity audits (reentrancy, access control, oracle manipulation, flash-loan vectors), proxy upgrade safety, signature malleability, MEV/front-running surface.
Access is invite-only. Both trial and paid keys are issued manually after a short conversation about your use case. We do this to keep the platform fast for actual security researchers and away from generic abuse.
sk-ai0day-… via email. Store it securely — keys are SHA-256 hashed at rest and cannot be recovered if lost.curl https://api.ai0day.com/v1/chat \
-H "Authorization: Bearer sk-ai0day-…" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Audit reentrancy in withdraw() function."}],
"mode": "web3_audit",
"max_tokens": 800,
"temperature": 0.0
}'
GET https://api.ai0day.com/health returns {"gateway":"ok","vllm":"ok","rag":"ok"} when everything is up. Use it in monitoring before integration goes live.
One product, two simple tiers. We do not meter tokens. We meter access duration. This means you can deploy long-context RAG queries (up to 16K) without watching a per-token counter.
For evaluation. One key per organization, single device.
For production research workflows.
Billing cycle is per key. We do not auto-charge; renewal is opt-in each month. If your key expires, requests return HTTP 401 key_expired until you renew.
Claude Code, Anthropic's CLI, can be redirected to AI0Day with two environment variables. Your existing Claude Code workflow (slash commands, MCP servers, hooks) continues to work — only the model and API endpoint change.
Set two environment variables. Claude Code sends every request to AI0Day instead of api.anthropic.com. The model is selected automatically by the gateway — you don’t have to choose.
# In ~/.zshrc / ~/.bashrc
export ANTHROPIC_BASE_URL="https://api.ai0day.com"
export ANTHROPIC_AUTH_TOKEN="sk-ai0day-…"
# Restart your shell, then verify:
claude --version
claude -p "Detect APT41 lateral movement signatures in a Linux env."
The gateway auto-detects which security mode to use (apt / reverse / vuln / web3) from the query content and routes to the LoRA-tuned model. Multi-turn conversation history is supported. Tool use (function calling) and token-by-token SSE streaming are on the roadmap.
If you prefer OpenAI client conventions (works with openai Python SDK, litellm, continue.dev, etc.):
export OPENAI_BASE_URL="https://api.ai0day.com/v1"
export OPENAI_API_KEY="sk-ai0day-…"
# Python
from openai import OpenAI
cli = OpenAI()
r = cli.chat.completions.create(
model="ai0day", # any non-empty string; gateway picks the right model
messages=[{"role": "user", "content": "Triage CVE-2024-3400."}],
)
print(r.choices[0].message.content)
Mode auto-detected from query. To force a specific mode, append a suffix: model="ai0day-vuln_triage" (or -apt_detection / -reverse_analysis / -web3_audit).
For shell scripting, CI pipelines, or non-SDK environments:
curl https://api.ai0day.com/v1/chat \
-H "Authorization: Bearer sk-ai0day-…" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Reverse engineer 0x55 0x48 0x89 0xe5 in x86_64 context."}
],
"mode": "reverse_analysis",
"max_tokens": 1024,
"temperature": 0.0
}'
mode parameter is what triggers the security RAG retrieval pipeline. Without a mode, you get a generic LLM response. Always set mode to one of apt_detection / reverse_analysis / vuln_triage / web3_audit.
Concrete workflows our users run in production. All examples use Claude Code with ANTHROPIC_BASE_URL pointing at AI0Day.
$ claude
> /apt @./samples/loader.bin
> Disassemble the entry point and identify the unpacking routine.
> If you find a custom XOR loop, decode the embedded config block
> and tell me the C2 generation algorithm.
$ claude -p "Given these process-tree events from a Linux endpoint, \
which APT group's TTPs match best? Map to MITRE ATT&CK and \
suggest detection rules I can deploy in Falco."
$ claude
> Audit contracts/Vault.sol and contracts/Strategy.sol.
> Flag every reentrancy path, oracle dependency, and access-control
> bypass. Output a severity-sorted finding list with line refs.
#!/bin/bash
# .github/workflows/triage.sh
curl -s https://api.ai0day.com/v1/chat \
-H "Authorization: Bearer $AI0DAY_KEY" \
-d "{\"messages\":[{\"role\":\"user\",\"content\":\"$CVE_DESC\"}],\
\"mode\":\"vuln_triage\",\"max_tokens\":1500}" \
| jq -r '.content' > triage_report.md
A LoRA-tuned variant of GLM-5.1 (abliterated base, FP8-quantized), running on 8×H200 with vLLM. The LoRA layer is rank-32, alpha-64, trained on a curated security corpus. The base model has the standard refusal layer disabled; the LoRA does not reintroduce it. Adversarial-set refusal rate is 96.7% (vs 93.3% for the base alone) — domain training nudges it slightly safer than the bare abliterated base.
Requests are processed in-memory only. The gateway logs request metadata (timestamp, key id, mode, latency, status) for billing and abuse detection. Request bodies and response content are not persisted. The RAG corpus is read-only public security data.
P50 around 12s, P95 around 30s for typical RAG queries (rag_k=3, max_tokens=512). Long-context queries (rag_k=8, max_tokens=1024) are ~14–22s. The latency budget is dominated by H200 generation time, not network.
The default rate limit is 5 concurrent globally, 2 per key. For higher concurrency contact us — we provision a dedicated GPU pool starting at $50K/month.
Not yet. The Anthropic and OpenAI compatible endpoints accept tools in the request body but ignore them in this release. Native tool use is on the roadmap.
Email us. We disable it within minutes and reissue. Device fingerprints are tracked per key — if more than 3 distinct devices use the same key in 24 hours, we automatically flag it.