Training Specialist Models: Automating Malware Development
This post complements the presentation I gave at Black Hat USA 2025.
Can a small, self-hosted LLM outperform state-of-the-art models at evasive malware development?
In this technical deep dive, we explore how reinforcement learning with verifiable rewards (RLVR) enables training compact specialist models that rival large generalists in domain-specific tasks.
In the first half of this post, we’ll break down the LLM training process and recent opportunities created by RLVR. The second half details our training methodology for Dante-7B, a 7 billion parameter model that generates functional, evasive Cobalt Strike shellcode loaders capable of bypassing Microsoft Defender for Endpoint (MDE). We’ve released Dante-7B on Hugging Face, complete with a demo app so anyone can experiment with the model.
Tags: AI