OpenAI’s o3 and o4-mini Models Achieve 98.7% Success Rate in Blocking Biorisk-Related Prompts with New Safeguard

OpenAI’s latest o3 and o4-mini models now use a new safety monitor that blocks 98.7% of biorisk-related prompts, aiming to prevent misuse for biological and chemical threats.

What to know

OpenAI’s new o3 and o4-mini AI models feature a dedicated safeguard to block prompts related to biological and chemical threats.
The safety system uses a reasoning monitor trained to detect and refuse high-risk requests, achieving a 98.7% block rate in tests.
Red teamers spent about 1,000 hours flagging unsafe conversations to help train the system.
OpenAI acknowledges the need for continued human oversight as some risks remain unaddressed by automated systems alone.

OpenAI has introduced a new safety system in its latest AI models, o3 and o4-mini, to address the growing concern of AI misuse for biological and chemical threats. These models are now equipped with a specialized reasoning monitor designed to detect and block prompts that could lead to harmful instructions, such as those related to the creation of biological weapons.

The safeguard was developed in response to the enhanced capabilities of o3 and o4-mini, which internal assessments showed were more adept at answering questions about developing certain biological threats compared to previous models. To counteract this, OpenAI trained the monitor to align with its content policies, ensuring that the models refuse to provide guidance on high-risk topics.

To evaluate the system’s effectiveness, OpenAI’s red team conducted approximately 1,000 hours of testing, identifying unsafe conversations related to biorisks. In these simulated evaluations, the models declined to respond to risky prompts 98.7% of the time. However, OpenAI notes that these tests did not account for users who might attempt alternative prompts after an initial block, highlighting the ongoing need for human oversight and intervention.

While o3 and o4-mini do not meet OpenAI’s “high risk” threshold for biorisks, they have demonstrated a greater propensity to assist with inquiries about biological weapons compared to earlier models like o1 and GPT-4. This underscores the importance of proactive safety measures as AI capabilities advance. OpenAI’s updated Preparedness Framework reflects a broader commitment to monitoring how its models could potentially facilitate the development of chemical and biological threats.

Despite these efforts, some researchers have expressed concerns about OpenAI’s safety prioritization, citing limited time for thorough testing and the need for more comprehensive safety reports. OpenAI continues to rely on a combination of automated systems and human monitoring to address these evolving risks.

Via: Tech Crunch

OpenAI’s o3 and o4-mini Models Achieve 98.7% Success Rate in Blocking Biorisk-Related Prompts with New Safeguard

What to know

Allen

GPT‑5.2‑Codex: OpenAI’s Next-Gen AI for Advanced Coding and Cybersecurity

ChatGPT Gets New Personalization Sliders for Enthusiasm and Warmth

OpenAI Tightens Chatgpt Rules for Teens with New Safety-Focused Model Update

Best Weight Farming Method in Grow a Garden 2 After the Bamboo Nerf

What to know

Allen

You may also like

GPT‑5.2‑Codex: OpenAI’s Next-Gen AI for Advanced Coding and Cybersecurity

ChatGPT Gets New Personalization Sliders for Enthusiasm and Warmth

OpenAI Tightens Chatgpt Rules for Teens with New Safety-Focused Model Update

Best Weight Farming Method in Grow a Garden 2 After the Bamboo Nerf