Instructor: Malachi Jones
Dates: June 15 to 18 2026
Capacity: 25
This course teaches how to automate reverse engineering (RE) for malware, firmware, and vulnerability analysis using AI/ML, graphs, large language models (LLMs), and agents. Students begin with Blackfyre, an open-source framework that extracts binaries into a Protocol Buffers (protobuf) format for downstream analysis. Hands-on labs guide students in building a lightweight, BinQL-inspired graph analysis system using Blackfyre and Neo4j to support workflows such as malware family clustering, firmware analysis, and vulnerability tracing. A full open-source BinQL reference implementation will be released after the training cycle later this year. Additional hands-on labs cover NL2GQL, which translates natural-language RE questions into graph queries so students can focus on analysis rather than query syntax, along with transformer-based embeddings, LLM techniques including RAG and KnowledgeRAG, and the Model Context Protocol (MCP). The course culminates in applied labs on agentic workflows using fine-tuned LLaMA models and frameworks such as Autogen for adaptive and automated reverse engineering.
This course teaches how to automate reverse engineering (RE) for malware, firmware, and vulnerability analysis using AI/ML, graph analysis, large language models (LLMs), and agents. Students begin with Blackfyre, a framework developed for this course and released as open source, which structures binaries into Protocol Buffers (protobuf) for downstream analysis. They also use the Blackfyre Ghidra plugin, supporting both interactive and headless execution for integration into RE pipelines. Building on this foundation, hands-on labs guide students in implementing a lightweight, BinQL-inspired graph analysis system that integrates with Neo4j to represent binaries as graphs of functions, basic blocks, imports, and strings, enabling workflows such as malware clustering, firmware ecosystem analysis, and vulnerability tracing. A full open-source BinQL reference implementation will be released after the training cycle later this year. To reduce complexity, students are introduced to NL2GQL, which translates natural-language RE questions into graph queries, allowing them to focus on analysis rather than query syntax.
The second half of the course focuses on embeddings, transformers, and LLM-driven automation. Students learn to convert binary artifacts—functions, strings, imports, and basic blocks—into vector embeddings for similarity detection, clustering, function name prediction, and vulnerability analysis. A central technique is BasicBlockRank (BBR), which uses control-flow and call graphs to rank basic blocks, with referenced artifacts inheriting their importance, improving embedding quality for downstream tasks. These embeddings also serve as the foundation for RAG, KnowledgeRAG, and agent workflows, where they ground retrieval, reasoning, and decision-making. Building on this, the course introduces transformers for function prediction and binary similarity, and agent pipelines using Autogen and the Model Context Protocol (MCP). It concludes with fine-tuning LLaMA models via LLaMAFactory to improve RE-specific applications such as function labeling, reporting, and NL2GQL accuracy.
Students should ensure they have a laptop with a minimum of 32 GB RAM, 250 GB of free disk space, and a processor with at least 4 cores, equivalent to an Intel i7 or higher. The processor must be an x86_64 architecture to ensure compatibility with the course-provided virtual machine (VM) and to run VirtualBox version 7.1 or later. Additionally, the processor must support AVX (Advanced Vector Extensions), which are required for running machine learning frameworks such as TensorFlow and PyTorch. Connectivity capabilities are also essential for accessing external services used in the Large Language Models (LLMs) components of the course. VirtualBox should be pre-installed to enable participation in the hands-on labs and exercises.
Students should have a solid foundation in reverse engineering and be comfortable with Python object-oriented development. Familiarity with basic ML concepts (e.g., vectors, supervised learning, precision/recall) is helpful but not required; these topics will be introduced and covered at the start of the course to bring all participants to a common baseline.
Participants with no prior reverse engineering experience — the course assumes familiarity with RE concepts and tools.
This year's course expands beyond earlier versions by introducing graph-driven workflows, advanced LLM methods, and agentic automation for reverse engineering:
Dr. Malachi Jones is a Principal Cybersecurity AI/LLM Researcher and Manager at Microsoft, where he currently leads a team advancing red team agent autonomy within Microsoft Security AI (MSECAI). His present focus is on building autonomous red team agents, while his earlier work centered on fine-tuning large language models (LLMs) for security tasks and developing reverse engineering capabilities in Security Copilot.
With over 15 years in security research, Dr. Jones has contributed to both academia and industry. At MITRE, he advanced ML- and IR-based approaches for automated reverse engineering, and at Booz Allen Dark Labs, he specialized in embedded security and co-authored US Patent 10,133,871.
In addition to his work at Microsoft, Dr. Jones is the founder of Jones Cyber-AI, an organization dedicated to independent research and teaching initiatives. Through Jones Cyber-AI, he has developed and taught his specialized course, Automating Reverse Engineering Processes with AI/ML, NLP, and LLMs, at premier conferences including Black Hat USA (2019, 2021, 2023–2025) and RECON Montreal (2023–2025). His independent research in AI/ML, Graphs, and LLMs agents ensures his courses remain cutting-edge and aligned with the latest advances in cybersecurity and reverse engineering.
He previously served as an Adjunct Professor at the University of Maryland, College Park, and holds a B.S. in Computer Engineering from the University of Florida, as well as an M.S. and Ph.D. from Georgia Tech, where his research applied game theory to cybersecurity. His expertise continues to drive innovation in AI-driven cybersecurity and automated reverse engineering.
Click here to register.