
Backdoors survive
alignment.
We remove adversarial backdoors from AI models. Independent third-party model cleaning for defense and compliance.
Request AccessAlignment doesn't mean clean.
Adversaries inject poisoned samples into training data. These samples embed hidden triggers that survive standard alignment and activate under specific conditions in production.
The attack chain

Poisoning
Backdoor triggers injected into training data at scale.

Survival
Triggers persist through RLHF and alignment procedures.

Deployment
Compromised model passes all standard safety evaluations.

Activation
Hidden behavior fires when trigger condition is met.
Continued training
removes backdoors.
Our defense uses continued training on verified, clean data to neutralize triggers without needing prior knowledge of the attack mechanism.
Request AccessCleanup Effort by Model Size
Steps to remove backdoor (normalized)
Based on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Hubinger et al., 2024)
Validated on pretrained language models up to 6.9B parameters. Fine-tuned model support in development.
Model cleaning
service.
We don't host your models. We clean them. You provide a model, we apply continued training on verified clean data, you receive a cleaned model.
You provide the model
Any architecture, up to 6.9B+ parameters.
We apply continued training
Verified clean data neutralizes hidden trigger mechanisms.
You receive a cleaned model
Backdoor behaviors removed. No hosting, no data retention.
Compliance & Audit
Independent verification for regulatory and audit requirements.
Trusted Third Party
Third-party in the loop instead of relying solely on model providers.
Research-Backed
Built on rigorous research, tested on models up to 6.9B parameters.
Request access.
We work with organizations that need independent verification of AI model safety for compliance and audit purposes.