venerdì 12 giugno 2020

Deobfuscating C++ ADVobfuscator with Sojobo and the B2R2 binary analysis framework


Twitter: @s4tan
GitHub code: https://github.com/enkomio/Sojobo/tree/master/Src/Tools/ADVDeobfuscator

At Black Hat Europe 2014 - Amsterdam was presented a new obfuscation tool named ADVobfuscator. It is based on C++11 metaprogramming. The paper describes in depth how the strings and function calls are obfuscated.

ADVobfuscator demonstates how to use C++11/14 language to generate, at compile time, obfuscated code without using any external tool or a custom compiler.
Compile time obfuscators (like this one) are quite annoying to analyze since it is not easy to write a generic deobfuscator that it is based on code patterns recognition. In fact, the resulting binary code depends on the compiler, the used flags and so on. This will result in a series of corner cases that must be correctly handled to correctly deobfuscated the code. The worst part is that the handling of these corner cases might not be reused for a different sample that was compiled with a different compiler or with different flags. My idea to solve this problem, it is to write a deobfuscator that is based on flags extracted through the execution of generic heuristics. In this way, I can abstract the analysis from the code details.

Another interesting aspect of ADVObfuscator, it is that it was recently used to protect a malware sample that was analyzed in this very interesting blog post. In particular, in section "3. Latest variant of Team9 loader", it is possible to see a reference to the strings deobfuscation process.

In this blog post, I'll focus on the strings obfuscation part, by writing an utility that is able to decode the obfuscated strings. The deobfuscation utility uses the B2BR binary analysis framework to statically analyze the binary, and Sojobo to emulate the code.

The sample that I'll consider has SHA256 hash value: aaa9268b4a80f75eeb58b61cbd745523b1823d5adf54c615ad9ddf6b8fa0e806. It was used in a demo during my talk at HackInBo Safe Edition and can be downloaded from my GitHub repository.

Identify obfuscated strings

This is probably the most annoying part. We can't rely on specific code patterns, since according to the used compiler, the code might change. My idea was to abstract this concept and tries to identify interesting points, by using a series of heuristics. ADVObfuscator uses various methodologies to obfuscate the strings, some of them are reported below:

1400012AA    movdqa  xmm0, cs:xmmword_140023520          ; load obfuscated buffer
1400012B2    movdqu  [rbp+57h+var_90], xmm0
1400012B7    mov     rcx, r14
1400012BA
1400012BA loc_1400012BA:; CODE XREF: sub_1400011F4+D4↓j
1400012BA    mov     al, byte ptr [rbp+57h+var_90]
1400012BD    xor     byte ptr [rbp+rcx+57h+var_90+1], al ; deobfuscation
1400012C1    add     rcx, r15                            ; Increase counter
1400012C4    cmp     rcx, 0Eh                            ; check size
1400012C8    jb      short loc_1400012BA
1400012CA    mov     byte ptr [rbp+57h+var_90+0Fh], r14b ; set null byte
Unfortunately, not all deobfuscation tasks are implemented as in-line code, in some cases a function is invoked, as reported below.

140006161    movdqa  xmm0, cs:xmmword_140023800          ; load obfuscated buffer
140006169    lea     rcx, [rbp+var_30]                   ; pointer to the obfuscated buffer
14000616D    xor     eax, eax
14000616F    mov     [rbp+var_20], 627A6844h
140006176    movdqu  [rbp+var_30], xmm0
14000617B    mov     byte ptr [rbp+var_1C], al
14000617E    call    sub_140003684                       ; call deobfuscation function
140006183    mov     r9d, r15d
..............
140003684 sub_140003684   proc near  ; CODE XREF: sub_140005EB8+2C6↓p
140003684    lea     rax, [rcx+1]                        ; skip first byte, which is used as key
140003688    mov     r9d, 13h                            ; set string size
14000368E    mov     r8, rax                             ; pointer to buffer to decode
140003691
140003691 loc_140003691:; CODE XREF: sub_140003684+19↓j
140003691    mov     dl, [rcx]                           ; read XOR key
140003693    xor     [r8], dl                            ; deobfuscation
140003696    inc     r8                                  ; increment buffer pointer
140003699    sub     r9, 1                               ; decrement counter and check for termination
14000369D    jnz     short loc_140003691
14000369F    mov     [rcx+14h], r9b
1400036A3    retn
1400036A3 sub_140003684   endp
In the later case, it is possible to see that the string size is hardcoded inside the function body and not passed as input parameter. This means that we have a lot of functions like that, that differ only for some minor changes (like the string size).

Heuristics definition

As said, my main idea is to analyze all the functions that the B2R2 framework is able to identify and extract the flags that are based on the heuristics that I created. You can find all the defined heuristics in the associated source code. An excerpt from that list is presented below:
The heurstics above are used for the following tasks:
  • Identify all functions that deobfuscate a string. This task is useful to cover the case of the deobfuscation process defered to another function.
  • Identify the start of the code in charge for the deobfuscation.
  • Identify the address of the buffer that will be deobfuscated. This is done by identifying the deobfuscation operation.
  • Identify the end of the code in charge for the deobfuscation.

Emulation

At this point I have the following information: the functions that run a deobfuscation task and the related chunk of code in charge for this task. The final step is to emulate this code and read the deobfuscated string from memory. Before to run the emulation it is necessary to execute one final step. The heuristics might miss some important information, like the register that is used to increment the counter (in one of the example above we can see that r15 is used to increment the counter).

To cover this problem I used two strategies. In the first strategy, I do a backtrace analysis starting from the identified start address, and verify if the instruction is safe to be emulated. If so, I'll change the start address. The second strategy analyzes the instructions that should be emulated and if it notices that exists an operation that add two registers, I set the value of the source register, inside the emulator, to the value 1.
These strategies seem to be good enough to catch possible registers initialization code.

We can now run the emulator and read the decrypted string from memory.

Result

Below is reported a short video of the execution of the deobfuscation tool on the considered sample. With these information it shouldn't be too difficult to patch the original file and to NOP the deobfuscation operations. I tested it on various samples and it seems to work properly. If you found any errors just send me a message on twitter.