domenica 19 maggio 2019

Sojobo - Yet another binary analysis framework

Twitter: @s4tan

Sojobo GitHub project:

Sojobo is a new binary analysis framework written in .NET and based on B2R2. I created this project for learning purpose and to make my work easier during malware analysis.


A couple of months ago a new binary analysis framework named B2R2 was released ([01, 02]), which also won the "BAR 2019 Best Paper Award" ([03]). It immediately attracted my attention since it is fully developed in F# in .NET Core and doesn't need any external libraries. This was a big plus for me since I love F# and I always had issues with the most common binary analysis frameworks (like the needs of a specific library version or the python binding is not working with the latest version or they are supposed to run only on Linux).

B2R2 is a framework with an academic origin (this is a very rare case, since academic are reluctant to release working source code) and the developer is very responsive (and kind) on GitHub. It supports various CPU architectures and implements a new IR (LowUIR) which is very simple to understand. All sound very promising :)

Unfortunately, as the B2R2 main developer wrote ([04]), it is a frontend framework and at the moment no implementation is provided as backend. Also, they are considering running a business on the implementation of a backend framework and at the moment they are unsure when they will release it.

In the meantime that such code will be released I decided to write a backend on my own :)

Using Sojobo

Sojobo allows to emulate PE binary (32 bit) and to interact with the emulation. It implements a Sandbox class that can be used to emulate a given binary. In the following paragraph we will see how to write a simple generic unpacker.

Implementing a generic unpacker

As first example I tried to write a tool that dumps a dynamically allocated memory region which is then executed. My purpose was to write a generic unpacker (as a POC of course) by following the principles described in the paper "Automatic Static Unpacking of Malware Binaries" ([05]). This kind of tools are pretty common among malware analysts, recently a new one was released([06]).

You can find the source code of this sample in the GitHub repository, I'll paste it here for convenience:

#include <stdint.h>
#include <Windows.h>

void copy_code(void *buffer)
  jmp start
  push ebp
  mov ebp, esp
  xor eax, eax
  mov edx, 1
  mov ecx, DWORD PTR [ebp+8]
  xadd eax, edx
  loop l
  mov esp, ebp
  pop ebp
  mov esi, code;
  mov edi, buffer;
  mov ecx, start;
  sub ecx, code;
  rep movsb

int main()
 uint32_t ret_val = 0;
 void *fibonacci = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
 ret_val = ((uint32_t (*)(uint32_t))fibonacci)(6);
 VirtualFree(fibonacci, 0x1000, MEM_RELEASE);
 return 0;

As you can see the code allocates a new memory region, invokes a function to copy some code and executes it. I tried to mimic a malware that unpacks the real payload in memory and then executes it. My goal is to dump such code.

To do that I'll follow a simple principle (described in the referred paper): if a memory region that was previously written to is executed, then I'll dump it to disk. By using Sojobo I subscribed to an event handler that is invoked each time that a memory is accessed. I can now step trough the process and monitor if a region that was previously written is now executed.

One of the first issue was to emulate invocation of external function (like VirtualAlloc). With Sojobo you can easily emulate such call by following a given coding convention (I'm a fan of convention over configuration paradigm [07]) but don't worry, Sojobo already implements emulation for some functions and I plan to support many more functions.

Saying that, the solution to our problem is the following one (the code is also in GitHub):

namespace ES.EndToEndTests

open System
open System.IO
open System.Collections.Generic
open B2R2
open ES.Sojobo.Model
open ES.Sojobo

module DumpDynamicMemory =
    let private _memoryRegions = new List<MemoryRegion>()
    let mutable private _memoryDumped = false

    let private memoryAccessedHandler(operation: MemoryAccessOperation) =
        match operation with
        | Read address -> ()
        | Write(address, value) -> ()
        | Allocate memRegion -> _memoryRegions.Add(memRegion)
        | Free memRegion -> ()

    let private writeDisassembly(activeProcess: IProcessContainer) =
        let text = Utility.formatCurrentInstruction(activeProcess)

    let private identifyUnpackedCode(activeProcess: IProcessContainer) =
        if not _memoryDumped then
            let pc = activeProcess.GetProgramCounter().Value |> BitVector.toUInt32
            |> Seq.tryFind(fun memRegion -> 
                pc >= uint32 memRegion.BaseAddress &&
                pc < uint32 memRegion.BaseAddress + uint32 memRegion.Content.Length
            |> Option.iter(fun memRegion ->
                // a previously allocated region now is being executed, maybe unpacked code!            
                let filename = String.Format("mem_{0}.bin", memRegion.BaseAddress)
                File.WriteAllBytes(filename, memRegion.Content)
                Console.WriteLine("[+] Dynamic code dumped to: {0}!", filename)
                _memoryDumped <- true

    let private step(activeProcess: IProcessContainer) =

    let private getTestFile() =
        ["Release"; "Debug"]
        |> dir -> Path.Combine("..", "..", "..", dir, "RunShellcodeWithVirtualAlloc.exe"))
        |> Seq.tryFind(File.Exists)

    let ``dump dynamically executed memory``() =
        let sandbox = new Win32Sandbox() 
        let exe = 
            match getTestFile() with
            | Some exe -> exe
            | None ->
                Console.WriteLine("RunShellcodeWithVirtualAlloc.exe not found, please compile it first!")


        // setup handlers
        let proc = sandbox.GetRunningProcess()
        // print imported function
        |> Seq.iter(fun symbol ->
                "Import: [0x{0}] {1} ({2}) from {3}", 
        // run the sample

The code is quite simple, each time that a memory region is allocated I add it to a list. For each executed instruction I monitor if EIP is in the range of one of the previously allocated memory and if so I dump the region content to disk. If we execute the code a new file is written to disk which contains the following disassembled code:

L_00000000:   push ebp
L_00000001:   mov ebp, esp
L_00000003:   xor eax, eax
L_00000005:   mov edx, 0x1
L_0000000A:   mov ecx, [ebp+0x8]
L_0000000D:   xadd eax, edx
L_00000010:   loop 0xd
L_00000012:   pop ebp
L_00000013:   ret 

A real world sample: emulates KPOT v2.0 and dumps the deobfuscated strings

Let's try to use Sojobo with a real world case. Recently, Proofpoint published a new article about a new KPOT version ([08]). We will consider the sample with SHA256: 67f8302a2fd28d15f62d6d20d748bfe350334e5353cbdef112bd1f8231b5599d.

In the GitHub repository I included the KPOT sample too, I took precaution to be sure that it is not executed by mistake (it is XORed, base64 encoded and with a corrupt PE header).

Our goal is to dump the strings once that they are decrypted. The function in charge for the decryption is at address 0x0040C8F5 and once that it returns in EAX is stored the length of the string and the EDI register points to the decrypted buffer. We can then read the memory content and print it.

Sojobo tries to emulate the most common functions and in particular it emulates GetLastError by returning 0 (success). If we take a look at the KPOT code we spot the following one:

.text:004103BB                 call    ds:LoadUserProfileW
.text:004103C1                 test    eax, eax
.text:004103C3                 jnz     short loc_4103D0
.text:004103C5                 call    ds:GetLastError
.text:004103CB                 cmp     eax, 57h ; 'W'
.text:004103CE                 jz      short loc_4103D5
.text:004103D0                 jmp     near ptr loc_4103D0+1 ; Jump to garbage

Basically, if the GetLastError code is different than 0x57 the process crash (jump to garbage data). So we have to override the GetLastError default function definition in order to force to return 0x57. This is done by creating a class with name Kernel32 and a function with name GetLastError that accepts as first parameter a ISandbox object. Take a look at this file for the implementation details. Then, we add our assembly to the Sandbox in order to consider our function implementation, finally as done before we setup a process step handler, which contains the following code:

private static void ProcessStep(Object sender, IProcessContainer process)
 var ip = process.GetProgramCounter().ToInt32();
 if (ip == _retAddresDecryptString)
  // read registers value
  var decryptedBufferAddress = process.GetRegister("EDI").ToUInt64();
  var bufferLength = process.GetRegister("EAX").ToInt32();
  // read decrypted string
  var decryptedBuffer = process.Memory.ReadMemory(decryptedBufferAddress, bufferLength);
  var decryptedString = Encoding.UTF8.GetString(decryptedBuffer);
  Console.WriteLine("[+] {0}", decryptedString);

By reversing the sample we know that the decrypt function end at address 0x0040C928, so when this point is reached we can dump the decrypted string by reading the EAX and EDI register values and also by reading the process memory. Find below an example of execution:

-=[ Start Emulation ]=-
[+] wininet.dll
[+] winhttp.dll
[+] ws2_32.dll
[+] user32.dll
[+] shell32.dll
[+] advapi32.dll
[+] dnsapi.dll
[+] netapi32.dll
[+] gdi32.dll
[+] gdiplus.dll
[+] oleaut32.dll
[+] ole32.dll
[+] shlwapi.dll
[+] userenv.dll
[+] urlmon.dll
[+] crypt32.dll
[+] mpr.dll
-=[ Emulation Completed ]=-

Of course that list is by no means exhaustive. We will see in the next paragraphs why of this.

It is really so simple and smooth?

I would love to say yes, but there are still some limitations (that I already planned to solve). The output above is taken by emulating the KPOT function that is in charge for loading the real used DLLs. Before that code we have the following one:

.text:00406966 64 A1 30 00 00 00             mov     eax, large fs:30h ; read PEB
.text:0040696C 8B 40 18                      mov     eax, [eax+18h]    ; read Heap
.text:0040696F C3                            retn

Basically, it reads the Heap base address from PEB. A solution to this would be to place some fake values but it is not a good solution in the long term (KPOT resolves function addresses by walking the EAT). So I defined a PEB and TEB structures and written them to the process memory (I also correctly initialized the FS register). I have also implemented a serialization algorithm that will allows us to "read" object type from memory (instead that just a bunch of raw bytes). This will be very handy if we want to customize some complex structure (like PEB in this case). In the next paragraph we will take advantage of this feature.

The second problem is that KPOT tries to resolve function addresses by walking the Ldr field. It also use the Ldr field to find the base address of Kernel32, this is done by the following code:

.text:00406936                               get_Kernel32_base_via_Ldr proc near
.text:00406936 64 A1 30 00 00 00             mov     eax, large fs:30h ; read PEB
.text:0040693C 8B 40 0C                      mov     eax, [eax+0Ch]    ; read Ldr
.text:0040693F 8B 40 0C                      mov     eax, [eax+0Ch]    ; read InLoadOrderModuleList
.text:00406942 8B 00                         mov     eax, [eax]        ; read first entry (ntdll)
.text:00406944 8B 00                         mov     eax, [eax]        ; read second entry (kernel32)
.text:00406946 8B 40 18                      mov     eax, [eax+18h]    ; read DllBase
.text:00406949 C3                            retn 
.text:00406949                               get_Kernel32_base_via_Ldr endp

Even in this case you can just fake this value and write back the LDR_DATA_TABLE_ENTRY structure to memory but very soon you will discover that this strategy with fail (in fact, in our test the emulation raise an exception).

Dumping all strings from KPOT v2.0 (for real)

In the previous paragraph was introduced a feature that allows us to read objects from the process memory. In this paragraph we will see how to dump all encrypted strings in a very easy way. As said by Proofpoint all strings are encrypted with a very simple algorithm and stored in a struct that has the following layout:

public class EncryptedString
 public UInt16 EncryptionKey;
 public UInt16 StringLength;
 public UInt32 Buffer;

 public String Decrypt(IProcessContainer process)
  var buffer = process.Memory.ReadMemory(this.Buffer, this.StringLength);
  var stringContent = new StringBuilder();
  foreach(var b in buffer)
   stringContent.Append((Char)(b ^ this.EncryptionKey));

  return stringContent.ToString();

It would be very useful if we can read from the memory process an EncryptedString object instead that a raw byte array (as done by the Proofpoint python script). With Sojobo you can do it and the code to print all the decrypted strings is as simple as this one:

private static void DecryptStrings(IProcessContainer process)
 Console.WriteLine("-=[ Start Dump All Strings ]=-");
 // encrypted strings
 var encryptedStringsStartAddress = 0x00401288UL;
 var encryptedStringsEndAddress = 0x00401838UL;

 var currentOffset = encryptedStringsStartAddress;
 while (currentOffset < encryptedStringsEndAddress)
  var encryptedString = process.Memory.ReadMemory<EncryptedString>(currentOffset);
  var decryptedString = encryptedString.Decrypt(process);
  Console.WriteLine("[+] {0}", decryptedString);

  // go to the next string
  currentOffset += 8UL; 

 Console.WriteLine("-=[ Dump All Strings Completed ]=-");

In the GitHub repository you can find the full source code (to dump all strings pass --strings as first argument). The result it is the same as the one provided by Proofpoint (but with a cleaner code :P).

Conclusion and future development

Sojobo is still in its infancy but it can already be used for some initial analysis. In its future releases I'm going to add more emulated functions and the possibility to map other files in the process address space. By mapping external files (like Kernel32 or Ntdll) we can overcome problems related to an indirect referencing (like in the case above) while still maintaining control on how to emulate the function.


[01] B2R2: Building an Efficient Front-End for Binary Analysis -
[02] B2R2: Building an Efficient Front-End for Binary Analysis (PDF) -
[03] NDSS Workshop on Binary Analysis Research (BAR) 2019 -
[04] Symbolic Execution component #question -
[05] Automatic Static Unpacking of Malware Binaries -
[06] MwEmu: Malware analysis emulator written in Python 3 (based on Unicorn) - ALPHA version -
[07] Convention over configuration -
[08] New KPOT v2.0 stealer brings zero persistence and in-memory features to silently steal credentials -

Nessun commento:

Posta un commento

Nota. Solo i membri di questo blog possono postare un commento.