Twitter: @s4tan
GitHub code: https://github.com/enkomio/Sojobo/tree/master/Src/Tools/ADVDeobfuscator
At Black Hat Europe 2014 - Amsterdam was presented a new obfuscation tool named ADVobfuscator. It is based on C++11 metaprogramming. The paper describes in depth how the strings and function calls are obfuscated.
ADVobfuscator demonstates how to use C++11/14 language to generate, at compile time, obfuscated code without using any external tool or a custom compiler.
Compile time obfuscators (like this one) are quite annoying to analyze since it is not easy to write a generic deobfuscator that it is based on code patterns recognition. In fact, the resulting binary code depends on the compiler, the used flags and so on. This will result in a series of corner cases that must be correctly handled to correctly deobfuscated the code. The worst part is that the handling of these corner cases might not be reused for a different sample that was compiled with a different compiler or with different flags. My idea to solve this problem, it is to write a deobfuscator that is based on flags extracted through the execution of generic heuristics. In this way, I can abstract the analysis from the code details.
Another interesting aspect of ADVObfuscator, it is that it was recently used to protect a malware sample that was analyzed in this very interesting blog post. In particular, in section "3. Latest variant of Team9 loader", it is possible to see a reference to the strings deobfuscation process.
In this blog post, I'll focus on the strings obfuscation part, by writing an utility that is able to decode the obfuscated strings. The deobfuscation utility uses the B2BR binary analysis framework to statically analyze the binary, and Sojobo to emulate the code.
The sample that I'll consider has SHA256 hash value: aaa9268b4a80f75eeb58b61cbd745523b1823d5adf54c615ad9ddf6b8fa0e806. It was used in a demo during my talk at HackInBo Safe Edition and can be downloaded from my GitHub repository.
Identify obfuscated strings
This is probably the most annoying part. We can't rely on specific code patterns, since according to the used compiler, the code might change. My idea was to abstract this concept and tries to identify interesting points, by using a series of heuristics. ADVObfuscator uses various methodologies to obfuscate the strings, some of them are reported below:
1400012AA movdqa xmm0, cs:xmmword_140023520 ; load obfuscated buffer
1400012B2 movdqu [rbp+57h+var_90], xmm0
1400012B7 mov rcx, r14
1400012BA
1400012BA loc_1400012BA:; CODE XREF: sub_1400011F4+D4↓j
1400012BA mov al, byte ptr [rbp+57h+var_90]
1400012BD xor byte ptr [rbp+rcx+57h+var_90+1], al ; deobfuscation
1400012C1 add rcx, r15 ; Increase counter
1400012C4 cmp rcx, 0Eh ; check size
1400012C8 jb short loc_1400012BA
1400012CA mov byte ptr [rbp+57h+var_90+0Fh], r14b ; set null byte
Unfortunately, not all deobfuscation tasks are implemented as in-line code, in some cases a function is invoked, as reported below.
140006161 movdqa xmm0, cs:xmmword_140023800 ; load obfuscated buffer
140006169 lea rcx, [rbp+var_30] ; pointer to the obfuscated buffer
14000616D xor eax, eax
14000616F mov [rbp+var_20], 627A6844h
140006176 movdqu [rbp+var_30], xmm0
14000617B mov byte ptr [rbp+var_1C], al
14000617E call sub_140003684 ; call deobfuscation function
140006183 mov r9d, r15d
..............
140003684 sub_140003684 proc near ; CODE XREF: sub_140005EB8+2C6↓p
140003684 lea rax, [rcx+1] ; skip first byte, which is used as key
140003688 mov r9d, 13h ; set string size
14000368E mov r8, rax ; pointer to buffer to decode
140003691
140003691 loc_140003691:; CODE XREF: sub_140003684+19↓j
140003691 mov dl, [rcx] ; read XOR key
140003693 xor [r8], dl ; deobfuscation
140003696 inc r8 ; increment buffer pointer
140003699 sub r9, 1 ; decrement counter and check for termination
14000369D jnz short loc_140003691
14000369F mov [rcx+14h], r9b
1400036A3 retn
1400036A3 sub_140003684 endp
In the later case, it is possible to see that the string size is hardcoded inside the function body and not passed as input parameter. This means that we have a lot of functions like that, that differ only for some minor changes (like the string size).Heuristics definition
As said, my main idea is to analyze all the functions that the B2R2 framework is able to identify and extract the flags that are based on the heuristics that I created. You can find all the defined heuristics in the associated source code. An excerpt from that list is presented below:- XOR with a single byte register
- XOR with a value on the stack
- Add a stack value with an immediate
- Call a function that does deobfuscation operations
- MOV immediate to stack
- JUMP to a lower address
- Identify all functions that deobfuscate a string. This task is useful to cover the case of the deobfuscation process defered to another function.
- Identify the start of the code in charge for the deobfuscation.
- Identify the address of the buffer that will be deobfuscated. This is done by identifying the deobfuscation operation.
- Identify the end of the code in charge for the deobfuscation.
Emulation
At this point I have the following information: the functions that run a deobfuscation task and the related chunk of code in charge for this task. The final step is to emulate this code and read the deobfuscated string from memory. Before to run the emulation it is necessary to execute one final step. The heuristics might miss some important information, like the register that is used to increment the counter (in one of the example above we can see that r15 is used to increment the counter).To cover this problem I used two strategies. In the first strategy, I do a backtrace analysis starting from the identified start address, and verify if the instruction is safe to be emulated. If so, I'll change the start address. The second strategy analyzes the instructions that should be emulated and if it notices that exists an operation that add two registers, I set the value of the source register, inside the emulator, to the value 1.
These strategies seem to be good enough to catch possible registers initialization code.
We can now run the emulator and read the decrypted string from memory.