The Standard for
Open Cyber LLM Arena

Stop guessing if your LLM is secure or just stubborn.
OCLA is a crowdsourced, privacy-first platform where anyone can contribute to evaluating LLMs on uncensored offensive and defensive cybersecurity capabilities.

./START_BENCHMARK VIEW_LEADERBOARD

No Logs Stored Anon Ops

Live Evals

The Alignment Trap

Most LLM benchmarks (MMLU, GSM8K) measure general reasoning, not security utility. When you ask a model to help secure a network, does it act as a helpful Red Teamer or does it refuse with a generic safety lecture?

// Typical Model Response

"I cannot assist with checking for vulnerabilities as it may be unethical..."

OCLA exists to quantify the fine line between Helpful Security Assistant and Over-Refusal.

MetricScore

Cybersec Knowledge92.4%

Refusal Rate (False Pos)12.1%

Code Safety98.9%

How It Works

A frictionless, privacy-preserving workflow for security researchers.

1. Connect

Point OCLA to your local inference server (Ollama, LM Studio) or enter a provider API key. Keys are stored locally in your browser.

2. Benchmark

Run our curated suite of Red Team & Blue Team prompts. We test for SQLi, XSS, Privilege Escalation knowledge, and defensive coding.

3. Analyze & Share

Get instant scoring. View detailed breakdowns of refusals vs. compliance. Optionally, upload anonymous scores to the global leaderboard.

Privacy is Non-Negotiable

Client-Side Only

We do not proxy your requests. All benchmark traffic goes directly from your browser to your model provider (OpenAI, Anthropic, or Localhost).

Zero Data Retention

We never store your API keys, prompts, or model outputs on our servers. The only data we receive is the final numerical score if you choose to submit it.

The Standard for Open Cyber LLM Arena