How Do We Keep AI From Enabling Bioweapons?

I found the existing material on how threat modeling, evals, and safeguards connect in AI biosecurity surprisingly scattered. In this post, I hope to tie these together. Put simply: threat models define the scenarios we're worried about, safety evals measure whether AI enables those scenarios, and safeguards are what we deploy when a dangerous capability has been reached.

Background

The central risk of AI in biology is that it'll lower the bar for people to make bioweapons. Specifically, there are two scenarios that most people, including frontier AI labs (OpenAI 2025), are interested in: 1) a novice being able to create an existing bioweapon, and 2) an expert being able to create a novel bioweapon.

The two types of AI systems that matter are general-purpose large language models (LLMs), and biological design tools (BDTs). LLMs can potentially provide guidance across the bioweapon creation process, from identifying candidate agents to troubleshooting wet-lab protocols, helping with both existing and novel bioweapons. BDTs, specialized ML models built for tasks like protein design, sequence generation, or assay prediction, are particularly worrying for the class of novel bioweapons.

The danger 1 of these two AI systems can be more precisely defined by analyzing their uplift in humans: "the improvement in capabilities or outcomes that can be achieved with a given set of resources, or the realisation of a given capability or outcome with fewer resources" (Rose et al. 2024). Usually, this uplift is in contrast to someone just having access to the internet to complete this same task.

Threat modeling: figuring out what to worry about

To focus efforts on the most important risks, biosecurity (along with many other fields like cyber or AI safety) uses threat models: structured scenarios that identify key risks in a system.

A useful starting point is the bioweapon creation pipeline, since most threat scenarios involve someone deliberately creating one. The Centre for Long-Term Resilience's (CLTR) (Nelson and Rose 2023) framework breaks this into stages: intention, ideation, design, weaponization, and release.

The pipeline tells you how a bioweapon gets made, but threat modeling also needs to capture who, what, and under what circumstances. Righetti 2025 lays out a useful framework (built on National Institute of Standards and Technology's (NIST) recommendation (Nist 2025)) that organizes these broader considerations into four axes:

  1. the threat actor (ranging from a non-expert individual with little money to highly capable state-sponsored groups, as categorized by CLTR (Rose et al. 2024))
  2. the biological agent (from epidemic-capable pathogens to novel global catastrophic risks)
  3. the method of acquisition (self-manufacture, existing suppliers, natural sources, or theft)
  4. route to harm (deliberate misuse, accidental release, or coercive threat)

Evals: measuring what we're worried about

Given a set of threat models, we can measure whether an AI system enables those scenarios by building evals.

A practical approach to designing evals is to map existing evals onto the bioweapon pipeline, identify which stages are still uncovered given your threat models, and build tests that fill those gaps. A strong example is the Virology Capabilities Test (VCT), which evaluates whether an LLM can troubleshoot complex virology lab protocols (Götting et al. 2025). This is useful because many catastrophic threat scenarios involve using viruses, and in many of those cases actors will most likely need to troubleshoot virus-related wet-lab procedures.

Although this design process sounds straightforward, you'll notice there aren't that many biorisk evals.

Making biorisk evals is hard as we would like to avoid having people or AIs making bioweapons, since it increases the spread of dangerous knowledge (formally known as information hazards). Instead, eval designers must find proxy tasks that verify the underlying capability we care about, and even then, getting the ground truth answer for these tasks often requires domain experts. These proxy tasks must be specific enough that information can be learned from them, but not so structured that the benchmark is just hand-holding. This is why there have been criticisms of existing public evals (Ho and Berg 2025).

Luckily, these information hazard concerns can be avoided by keeping evals private (a practice followed by most frontier labs).

Safeguards: what we do when a capability is reached

Once evals (supplemented by manual red-teaming and expert review) indicate that a model has crossed a dangerous capability threshold, safeguards are deployed.

The Frontier Model Forum offers a useful taxonomy of safeguards, breaking them down by mode of application: model-level, system-level, and societal-level (FMF 2025).

Model-level safeguards are techniques applied during training, such as refusal fine-tuning or value alignment, that shape what the model will and won't do.

System-level safeguards are applied during deployment: input/output classifiers, usage monitoring, and trust-based access controls that filter how users interact with the model.

Societal-level safeguards operate entirely outside the AI system, things like nucleic acid synthesis screening and export controls.

Next steps

Here's what I'd recommend reading next.

References

  1. OpenAI. "Preparedness Framework V2". (2025).
  2. Rose, Sophie and Moulange, Richard and Smith, James and Nelson, Cassidy. "The Near-Term Impact of AI on Biological Misuse". (2024).
  3. Nelson, Cassidy and Rose, Sophie. "Understanding AI-Facilitated Biological Weapon Development". (2023).
  4. Righetti, Luca. "Dual-Use AI Capabilities and the Risk of Bioterrorism". (2025).
  5. Nist, Gaithersburg Md. "Managing Misuse Risk for Dual-Use Foundation Models". (2025).
  6. Götting, Jasper and Medeiros, Pedro and Sanders, Jon G. and Li, Nathaniel and Phan, Long and Elabd, Karam and Justen, Lennart and Hendrycks, Dan and Donoughe, Seth. "Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark". (2025).
  7. Ho, Anson and Berg, Arden. "Do the Biorisk Evaluations of AI Labs Actually Measure the Risk of Developing Bioweapons?". (2025).
  8. FMF. "Preliminary Taxonomy of AI-Bio Misuse Mitigations". (2025).
  9. Hong, Shen Zhou and Kleinman, Alex and Mathiowetz, Alyssa and Howes, Adam and Cohen, Julian and Ganta, Suveer and Letizia, Alex and Liao, Dora and Pahari, Deepika and Roberts-Gaal, Xavier and Righetti, Luca and Torres, Joe. "Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology". (2026).
  10. Brady, Kyle and Lee, Jeffrey and Maciorowski, Dawid and Worland, Alyssa and Despanie, Jordan and Persaud, Bria and Del Castello, Barbara and Bradley, Henry Alexander and Ellison, Grant and Teague, Charles and Gebauer, Sarah L. and McKelvey, Greg and Guerra, Steph and Guest, Ella. "Bridging the Digital to Physical Divide: Evaluating LLM Agents on Benchtop DNA Acquisition". (2026).
  11. Liu, Andrew Bo and Nedungadi, Samira and Cai, Bryce and Kleinman, Alex and Bhasin, Harmon and Donoughe, Seth. "ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity". (2025).
  12. Wang, Dianzhuo and Huot, Marian and Zhang, Zechen and Jiang, Kaiyi and Shakhnovich, Eugene I and Esvelt, Kevin M. "Without Safeguards, AI-Biology Integration Risks Accelerating Future Pandemics". (2025).

Footnotes

  1. For LLMs, there is also the concern of model misalignment, where the AI itself pursues harmful goals. However, most biosecurity literature, including frontier lab frameworks like OpenAI's Preparedness Framework (OpenAI 2025), focuses on model misuse, where a human actor deliberately uses AI to cause harm.