Train AI to Stop
Hallucinating
The production-grade RL environment for training and evaluating LLMs on hallucination avoidance. Built on 1M+ real-world examples across 38 benchmark datasets.
Why HallucinationGuard?
Research-grade evaluation for grounded AI systems
Factual Grounding
Rewards answers derived strictly from provided context
9-Component Reward
Factual correctness, grounding, calibration, NLI, BERTScore...
Real-World Datasets
SQuAD, HotpotQA, HaluEval, TruthfulQA, FEVER, and 33 more
Fast API
RESTful endpoints with OpenEnv compliance
NLI-Powered
Detects entailment and contradiction semantically
Leaderboard
Compare model performance across tasks
Three Difficulty Levels
Progressive curriculum from basic to adversarial
Task 1: Factual Grounding
Answer straightforward factual questions from a short context passage. Single-hop retrieval with unambiguous ground truth. Perfect for initial training.
BeginnerTask 2: Multi-Hop Synthesis
Synthesize evidence from multiple sentences. Connect disparate facts without fabricating bridging information. Requires reasoning chains.
IntermediateTask 3: Adversarial Resistance
Resist adversarial prompts designed to elicit hallucinations. Many questions are unanswerable — confident refusals are rewarded.
AdvancedInteractive Playground
Test the API directly in your browser
All Endpoints
Complete API reference at a glance