Download: https://github.com/enkomio/thematrix
In this post I'll describe a project that I created to easier the malware analysis process. The goal of the project is to run a target binary in a controlled environment and logging the Win32 function calls. I wanted to create something that is easy to extend and robust. I'm aware that other similar tools exists, but my intent was to have fun in doing Assembly programming and learning stuff that I only reversed but never implemented :)
How it works
TheMatrix is a program mostly written in assembly (x86/x64) that implements the following features:- It creates a PE loader (also referenced as an activator) that loads a user input binary (also know as target binary).
- A multi-arch hook engine that monitors the Win32 API function calls.
Create an activator
The first task consists in creating an activator. This is a binary that once executed loads the embedded PE file (the target binary) and runs its entry-point. The activator will be a DLL if the targety binary is a DLL or an EXE otherwise. The activator exports an additional function which is DllRegisterServer. This function is commonly used by malware to start the main code.Activator execution
When executed, the activator extracts the embedded binary and loads it in memory. Before executing the target binary entry-point, various Win32 function hooks are placed. This ensures that the malware execution is monitored. By default, TheMatrix implements various Windows hooks that log the input data to the folder: ./Desktop/thematrix/<PID>/<API_name>.log. During the PE loading step, the PEB.Ldr field is updated by including the target binary. This field contains a double linked list of all the currently loaded DLL and it is used by various Win32 API such as GetProcAddress. I still wonder why of the many PE loader projects available online, no one modifies the Ldr field.TheMatrix Under the hood
The core of TheMatrix is implemented in assembly. This gave me the possibility to improve my x64 assembly programming skills and at the same time to implement features that I only reveresed. The x86 and x64 version have quite a few differences which are detailed below.x86 Version
The 32-bit version of TheMatrix uses Microsoft Hot Patching mechanism to place the function hooks (see file x86_hook_engine.inc). The inserted JMP instruction will jump to a trampoline (a concept described later) that is placed in a code cave. The code cave is found by searching in the DLL sections. At execution time, when the API function is called by the target binary, the trampoline will execute and a jump to the user defined hook function is performed.x64 Version
I started to implement the project in x86 assembly. As soon as I finished the initial version, the malware that I was interested in analysing switched to x64. This forced me to re-implement all the code in x64 assembly too (here is my reaction when I discovered this fact: https://twitter.com/s4tan/status/1516488723294298116).When I decided to implement the x64 version too, I find myself in trouble since the x64 Win32 APIs do not support hot patching in the same way as the x86 version. This forced me to choose a different approach to place my hooks. In the end, I decided to use Export Address Table (EAT) hooking. As for the x86 version, a trampoline is used that will call the user defined hook function (see file x64_hook_engine.inc).
An additional aspect that is often ignored during the binary reversing process, it is that MS uses a different x64 function call convention when compared to x86 code (see this doc for more details). In addition, the stack needs to be 16 bytes aligned. In theory the concept is simple, but as often happens, the evil is in the details :) Luckily I found a useful 300 loc file that help me with this task (see https://twitter.com/s4tan/status/1522150733839273986).
Trampoline and hook function
The trampoline contains part of the magic that allowed me to create a clean design. Below you can see the x64 version of the trampoline code before being written to the identified code cave.@trampoline_code_start:
mov rax, 011223344aabbccddh ; store the address of the original function
mov qword ptr gs:[28h], rax ; TIB.ArbitraryUserPointer, see: https://codemachine.com/articles/arbitraryuserpointer_usage.html
mov rax, 011223344aabbccddh ; hook function address
jmp rax
Two places needs to be patched at runtime. The first is the address of the user defined function hook, and the second one is the original address of the hooked function. This latest information is necessary in order to easily call the original function as show in the section below. To store this value I choosed the TIB.ArbitraryUserPointer field which is part of the Thread Environment Block (or TIB in this case). This field is rarely used and is a good place to store our information. The only requirement is that the original function must be called in the same thread of the function hook.
Usage
As mentioned, the first step is to create the activator. This is achieved by using the -add command and specifying the target binary. TheMatrix will create a copy of itself containing the target binary. If the target binary is a DLL, TheMatrix will modify the activator file in order to result as a DLL and not as an EXE file. Once the activator is created, it can be executed in the same way as the target binary.One of the main goal of my project was to create something that was really easy to update. Adding a new function hook must be a deadly easy operation. In the end I come up with a design where you can extend the project in a simple way, you just need a bit of Win32 API programming skill (you can implement your code in C, no Assembly programming required ^^). To place an hook you just need to use the hook_add function, by specifying the DLL name, the API function name and the user defined hook function. An example of call is the following one:
hook_add("Bcrypt.dll", "BCryptImportKeyPair", hook_BCryptImportKeyPair);
Then, you have to implements your function hook. To call the original function it is enough to use the call_original function by passing the input parameters of the original function. This kind of design is possible thanks to the freedom provided by programming in assembly. An example of usage is shown below.
LPVOID __stdcall hook_BCryptImportKeyPair(BCRYPT_ALG_HANDLE hAlgorithm, BCRYPT_KEY_HANDLE hImportKey, LPCWSTR pszBlobType, BCRYPT_KEY_HANDLE* phKey, PUCHAR pbInput, ULONG cbInput, ULONG dwFlags)
{
// save imported key bytes
char name[MAX_PATH] = { 0 };
snprintf(name, sizeof(name), "BCryptImportKeyPair_%llx_%d", (uint64_t)pbInput, cbInput);
log_data(cbInput, pbInput, name);
LPVOID ret = call_original(
hAlgorithm,
hImportKey,
pszBlobType,
phKey,
pbInput,
cbInput,
dwFlags
);
return ret;
}
In the example above, the hook function logs the imported key before calling the original function. The final step is to inform TheMatrix of the available hooks before to run the target binary. This action is performed in the function hooks_init, whose definition is the following:
bool hooks_init(uint8_t* hMod)
The file hooks.c contains the function call, and can be customized by the user.