This presentation discusses reinforcement learning (RL) fine-tuning for Large Language Models (LLMs) tailored to specialized tasks, using evasive malware development as a case study. A new 7-billion parameter model demonstrating significant performance improvements over state-of-the-art generalist models on AV/EDR evasion tasks was released at the time of the Briefing.