Training Specialist Models: Automating Malware Development

This post complements the presentation I gave at Black Hat USA 2025.

Can a small, self-hosted LLM outperform state-of-the-art models at evasive malware development?
In this technical deep dive, we explore how reinforcement learning with verifiable rewards (RLVR) enables training compact specialist models that rival large generalists in domain-specific tasks.

In the first half of this post, we’ll break down the LLM training process and recent opportunities created by RLVR. The second half details our training methodology for Dante-7B, a 7 billion parameter model that generates functional, evasive Cobalt Strike shellcode loaders capable of bypassing Microsoft Defender for Endpoint (MDE). We’ve released Dante-7B on Hugging Face, complete with a demo app so anyone can experiment with the model.

Tags:

Read full post

Accelerating Offensive R&D with Large Language Models

At Outflank, we continually seek ways to accelerate our research and development efforts without compromising quality. In this pursuit, we’ve begun integrating large language models (LLMs) into our internal research workflows. While we’re still exploring the full potential of AI-powered offensive tooling, this post highlights how we’ve already used AI to speed up the delivery of traditional offensive capabilities.

By leveraging AI as a research accelerator, we can dedicate more time to refining, testing, and hardening the techniques that ultimately make it into our OST offering. This post is a case study of our AI-assisted exploration of the “trapped COM object” bug class.

Tags: , ,

Read full post