Software Deobfuscation Techniques

Instructors: Tim Blazytko
Dates: June 24 to 27 2024
Capacity: 20

Code obfuscation has become a vital tool to protect, for example, intellectual property against competitors. In general, it attempts to impede program understanding by making the to-be-protected program more complex. As a consequence, a human analyst reasoning about the obfuscated code has to overcome this barrier by transforming it into a representation that is easier to understand.

In this training, we get to know state-of-the-art code obfuscation techniques and look at how these complicate reverse engineering. Afterwards, we gradually familiarize ourselves with different deobfuscation techniques and use them to break obfuscation schemes in hands-on sessions. Thereby, participants will deepen their knowledge of program analysis and learn when and how (not) to use different techniques.

First, we have a look at important code obfuscation techniques and discuss how to attack them. Afterwards, we analyze a virtual machine-based (VM-based) obfuscation scheme, learn about VM hardening techniques and how to tackle them.

In the second part, we cover SMT-based program analysis. In detail, students learn how to solve program analysis problems with SMT solvers, how to prove characteristics of code, how to deobfuscate mixed Boolean-Arithmetic and how to break weak cryptography.

Before we use symbolic execution to automate large parts of code deobfuscation, we first introduce intermediate languages and compiler optimizations to simplify industrial-grade obfuscation schemes. Following, we use symbolic execution to automate SMT-based program analysis and break opaque predicates. Finally, we learn how to write disassemblers for virtualization-based obfuscators and how to reconstruct the original code.

The last part covers program synthesis, an approach to simplify code based on its semantic behavior. After collecting input-output pairs from binary code, we not only learn how to simplify large expression trees, but also how we can verify the correctness of simplifications. Then, we use program synthesis to deobfuscate mixed Boolean-Arithmetic and learn the semantics of VM instruction handlers.

TEACHING

Note that the training focuses on hands-on sessions. While some lecture parts provide an understanding of when to use which method, various hands-on sessions teach how to use them to build custom purpose tools for one-off problems. The trainer actively supports the students to successfully solve the given tasks. After a task is completed, we discuss different solutions in class. Furthermore, students receive detailed reference solutions that can be used during and after the course.

While the hands-on sessions use x86 assembly, all tools and techniques can also be applied to other architectures such as MIPS, PPC or ARM.

KEY LEARNING OBJECTIVES

Get to know the state-of-the-art of code obfuscation and deobfuscation techniques
Learn compiler optimizations, SMT-based program analysis, symbolic execution and program synthesis
Apply all techniques to break obfuscation schemes in various hands-on sessions
Write disassemblers for VM-based obfuscators and simplify complex arithmetic expressions

CLASS OUTLINE

The training orientates at the following outline:

Introduction to Code (De)obfuscation

Motivation
Application scenarios
Program analysis techniques

Code Obfuscation Techniques

Opaque predicates
Control-flow flattening
Mixed Boolean-Arithmetic
Virtual machines
Virtual machine hardening

Code Deobfuscation Techniques

Compiler optimizations
Reconstructing control flow
SMT-based program analysis
Taint analysis
Symbolic execution
Program synthesis

Compiler Optimizations

Dead code elimination
Constant propagation/folding
Static single assignment (SSA)
Optimizing obfuscated code

SMT-based Program Analysis

SAT and SMT solvers
Encoding programs analysis problems for SMT solvers
Proving semantic equivalence
Proving properties of a piece of code
Solving complex program constraints
Deobfuscating mixed Boolean-Arithmetic
Breaking weak cryptography

Symbolic Execution

Intermediate languages for reverse engineering
Symbolic and semantic simplification of obfuscated code
Automation in reverse engineering
Identifying virtual machine components
Interaction with SMT solvers
Breaking opaque predicates
Writing disassemblers for virtualization-based obfuscators

Program Synthesis

Concept of program synthesis
Learning code semantics based on its input/output behavior
Obtaining input/output pairs from code
Methods to simplify large expression trees
Proving the correctness of simplifications
Deobfuscating mixed Boolean-Arithmetic
Learning semantics of VM instruction handlers

Prerequisites

The participants should have basic reverse engineering skills. Furthermore, they should be familiar with x86 assembly and Python.

Software Requirements

Students should have access to a computer with 4 GB RAM (minimum) and at least 20 GB disk space. Furthermore, they should install a disassembler of their choice (e.g., IDA or Ghidra) as well as virtualization software such as Virtual Box or VMware. Students will be provided with a Linux VM containing all necessary tools and setups.

Bio

Tim Blazytko is a well-known binary security researcher and co-founder of emproof. After working on novel methods for code deobfuscation, fuzzing and root cause analysis during his PhD, Tim now builds code obfuscation schemes tailored to embedded devices. Moreover, he gives trainings on reverse engineering & code deobfuscation, analyzes malware and performs security audits.

Homepage

Twitter

LinkedIn

To Register

Click here to register.