OpenAI has launched the Open Source security reasoning model gpt-oss-safeguard, supporting policy-driven classification.

2025-10-29 12:40:18

Abstract generation in progress

PANews, October 29 - OpenAI today released the open-source safety reasoning model gpt-oss-safeguard (120b, 20b), allowing developers to provide custom policies for content classification during inference, with model output conclusions and reasoning chains. The model is fine-tuned based on the open weights of gpt-oss and is licensed under Apache 2.0, available for download from Hugging Face. Internal evaluations show it outperforms gpt-5-thinking and gpt-oss in multi-policy accuracy, with performance on external datasets close to Safety Reasoner. Limitations include: traditional classifiers still outperform in many high-quality annotated scenarios, and inference is time-consuming and requires high Computing Power. ROOST will establish a model community and release technical reports.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share