Recon2012 - PREVIEW

Recon 2012

Joan Calvet
Day Day 3 - 2012-06-16
Room Grand Salon
Start time 13:00
Duration 01:00
ID 208

Cryptographic Function Identification in Obfuscated Binary Programs

Cryptographic function recognition constitutes an important problem for binary program analysis because such functions usually contain crucial parts of the program internal logic. Moreover, cryptographic tasks are often implemented with well known functions, whose reference implementations are publicly available. Identifying automatically these functions with their arguments allows the analyst to understand the code without actually studying it.

Current techniques to identify well known functions in binary form are usually syntactic-centric and mainly rely on characteristics preserved in reasonable implementations, like particular constants or specific machine instructions. Nevertheless, these features are no more reliable in obfuscated binary programs, because they can easily disappear due to the obfuscation process.

In contrast, the solution we will discuss in this talk tends to be syntactic-independent by leveraging the predefined input-output relationship of cryptographic functions. An interesting fact with cryptographic functions is that they utilize very particular input values - typically a key and an input text - to produce deterministically an also very particular output value - an output text. Hence a pair of input-output values (A,B) such that a certain cryptographic function F verifies F(A)=B constitutes a signature for this function (A being typically the pair (key, input text), sometimes with an initialization vector, whereas B is the output text). Indeed the false positive probability is remarkably low, as it is very unlikely that another function - cryptographic or not - produces B when taking A as input. Moreover, all F implementations, even obfuscated ones, have to respect this relationship by definition.

Implementing this relatively simple idea to identify cryptographic functions in obfuscated programs is far from trivial. Indeed such environment lacks natural abstractions that would allow to easily consider candidate parts of the code for identification. In particular the common function "definition" in binary programs - based on calling convention and prologue-epilogue code - is not reliable in obfuscated code. Secondly, the numerous implementation-dependent data manipulated by machine instructions (control registers, return address...), combined with the absence of clear data structures, make the retrieval of cryptographic input-output parameters complicated.

Therefore we will discuss in this talk the way we implemented a cryptographic function identification technique based on the input-output relationship comparison for obfuscated binary programs. We will insist on the building process leading to the final tool, as we believe it is a generic way of tackling such identification problems, whereas the tool itself is suitable for some hard-to-detect cryptographic functions in some obfuscated binary programs. Among several examples we will show how we automatically identified algorithms such as RC4 - very often missed by existing tools - and XTEA in heavily obfuscated binary programs, with the appreciable side-effect of knowing precisely their arguments. Finally we will show that our technique allows the recognition of modified versions of well-known cryptographic algorithms.