domenica 26 giugno 2022

TheMatrix - A process inspection tool aimed at easier the malware analysis task

Twitter: @s4tan
Download: https://github.com/enkomio/thematrix

In this post I'll describe a project that I created to easier the malware analysis process. The goal of the project is to run a target binary in a controlled environment and logging the Win32 function calls. I wanted to create something that is easy to extend and robust. I'm aware that other similar tools exists, but my intent was to have fun in doing Assembly programming and learning stuff that I only reversed but never implemented :)

How it works

TheMatrix is a program mostly written in assembly (x86/x64) that implements the following features:
  • It creates a PE loader (also referenced as an activator) that loads a user input binary (also know as target binary).
  • A multi-arch hook engine that monitors the Win32 API function calls.

Create an activator

The first task consists in creating an activator. This is a binary that once executed loads the embedded PE file (the target binary) and runs its entry-point. The activator will be a DLL if the targety binary is a DLL or an EXE otherwise. The activator exports an additional function which is DllRegisterServer. This function is commonly used by malware to start the main code.

Activator execution

When executed, the activator extracts the embedded binary and loads it in memory. Before executing the target binary entry-point, various Win32 function hooks are placed. This ensures that the malware execution is monitored. By default, TheMatrix implements various Windows hooks that log the input data to the folder: ./Desktop/thematrix/<PID>/<API_name>.log. During the PE loading step, the PEB.Ldr field is updated by including the target binary. This field contains a double linked list of all the currently loaded DLL and it is used by various Win32 API such as GetProcAddress. I still wonder why of the many PE loader projects available online, no one modifies the Ldr field.

TheMatrix Under the hood

The core of TheMatrix is implemented in assembly. This gave me the possibility to improve my x64 assembly programming skills and at the same time to implement features that I only reveresed. The x86 and x64 version have quite a few differences which are detailed below.

x86 Version

The 32-bit version of TheMatrix uses Microsoft Hot Patching mechanism to place the function hooks (see file x86_hook_engine.inc). The inserted JMP instruction will jump to a trampoline (a concept described later) that is placed in a code cave. The code cave is found by searching in the DLL sections. At execution time, when the API function is called by the target binary, the trampoline will execute and a jump to the user defined hook function is performed.

x64 Version

I started to implement the project in x86 assembly. As soon as I finished the initial version, the malware that I was interested in analysing switched to x64. This forced me to re-implement all the code in x64 assembly too (here is my reaction when I discovered this fact: https://twitter.com/s4tan/status/1516488723294298116).

When I decided to implement the x64 version too, I find myself in trouble since the x64 Win32 APIs do not support hot patching in the same way as the x86 version. This forced me to choose a different approach to place my hooks. In the end, I decided to use Export Address Table (EAT) hooking. As for the x86 version, a trampoline is used that will call the user defined hook function (see file x64_hook_engine.inc).

An additional aspect that is often ignored during the binary reversing process, it is that MS uses a different x64 function call convention when compared to x86 code (see this doc for more details). In addition, the stack needs to be 16 bytes aligned. In theory the concept is simple, but as often happens, the evil is in the details :) Luckily I found a useful 300 loc file that help me with this task (see https://twitter.com/s4tan/status/1522150733839273986).

Trampoline and hook function

The trampoline contains part of the magic that allowed me to create a clean design. Below you can see the x64 version of the trampoline code before being written to the identified code cave.
@trampoline_code_start:
	mov rax, 011223344aabbccddh ; store the address of the original function
	mov qword ptr gs:[28h], rax ; TIB.ArbitraryUserPointer, see: https://codemachine.com/articles/arbitraryuserpointer_usage.html
	mov rax, 011223344aabbccddh ; hook function address
	jmp rax
Two places needs to be patched at runtime. The first is the address of the user defined function hook, and the second one is the original address of the hooked function. This latest information is necessary in order to easily call the original function as show in the section below. To store this value I choosed the TIB.ArbitraryUserPointer field which is part of the Thread Environment Block (or TIB in this case). This field is rarely used and is a good place to store our information. The only requirement is that the original function must be called in the same thread of the function hook.

Usage

As mentioned, the first step is to create the activator. This is achieved by using the -add command and specifying the target binary. TheMatrix will create a copy of itself containing the target binary. If the target binary is a DLL, TheMatrix will modify the activator file in order to result as a DLL and not as an EXE file. Once the activator is created, it can be executed in the same way as the target binary.

One of the main goal of my project was to create something that was really easy to update. Adding a new function hook must be a deadly easy operation. In the end I come up with a design where you can extend the project in a simple way, you just need a bit of Win32 API programming skill (you can implement your code in C, no Assembly programming required ^^). To place an hook you just need to use the hook_add function, by specifying the DLL name, the API function name and the user defined hook function. An example of call is the following one:

hook_add("Bcrypt.dll", "BCryptImportKeyPair", hook_BCryptImportKeyPair);
Then, you have to implements your function hook. To call the original function it is enough to use the call_original function by passing the input parameters of the original function. This kind of design is possible thanks to the freedom provided by programming in assembly. An example of usage is shown below.
LPVOID __stdcall hook_BCryptImportKeyPair(BCRYPT_ALG_HANDLE hAlgorithm, BCRYPT_KEY_HANDLE hImportKey, LPCWSTR pszBlobType, BCRYPT_KEY_HANDLE* phKey, PUCHAR pbInput, ULONG cbInput, ULONG dwFlags)
{
	// save imported key bytes
	char name[MAX_PATH] = { 0 };
	snprintf(name, sizeof(name), "BCryptImportKeyPair_%llx_%d", (uint64_t)pbInput, cbInput);
	log_data(cbInput, pbInput, name);

	LPVOID ret = call_original(
		hAlgorithm,
		hImportKey,
		pszBlobType,
		phKey,
		pbInput,
		cbInput,
		dwFlags
	);
	return ret;
}
In the example above, the hook function logs the imported key before calling the original function. The final step is to inform TheMatrix of the available hooks before to run the target binary. This action is performed in the function hooks_init, whose definition is the following:
bool hooks_init(uint8_t* hMod)
The file hooks.c contains the function call, and can be customized by the user.

Demo

The following video shows an example of TheMatrix usage. The video shows the execution of a malware and demonstrates how TheMatrix is able to trace the execution of a new process and the extraction of relevant information. The malware is a famous one and it is not difficult to recognize it if you are into malware analysis ;)

venerdì 20 maggio 2022

Alan c2 Framework v7.0: Hyper-Pivoting


Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework/releases/latest
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

A new Alan C2 Framework version was released, codename: Hyper-Pivoting. This new version includes some cool features like a proxy usage to allow the operator to easily pivoting through networks.

SOCKS5 Proxy

Network Pivoting is an essential part of every red-team activities and a must have feature for every C2 Framework. Alan v7.0 implements a proxy feature to easier network pivoting. By using the proxy command the operator can create a SOCKS5 compliant proxy on the machine where the agent is running, or interacting with an already running proxy.

Proxy chain is another useful feature that allows the operator to chain multiple proxies togheter. Creating a proxy chain is very simple, just use the command: proxy chain [proxy ID source] [proxy ID dest]. Some network segments can communicate only with specific addresses, this implies that reaching the C2 server is not an easy task. By using a chain of proxies the agent can establish a path to the Alan server and being able to compromise very segmented networks too.

The executed proxies are protected by a username and password. If the operator does no specify them, a randomly generated username and password is used (the operator can see the username and password by running the proxy command). As mentioned, the proxy are SOCKS5 proxies and can be used by any other programs that accept a SOCKS5 proxy.

One of the main Alan pillars is the in-memory execution of all its components, and the proxy has no exception. When a proxy is executed, its code runs inside the host process without touching the disk.

Misc features

Alan 7.0 includes other relevant features. The info command was improved by showing the Machine ID and if the agent is using a proxy. All Alan logs are now saved to the alan.log file. In addition, all the output generated by the Alan server and the commands inserted by the operator are saved to an evidence file. This allows the operator to include the evidence file as part of the red-team activity report.

Demo

The video below shows an example of proxy usage. After creating a proxy the Alan agent is instructed to use it. The video demonstrates that the running proxies are compliant to the SOCKS5 specification, by using one the created proxy with the curl utility. Next, a proxy chain is created and the network traffic displayed to show that the chain of proxies is traversed before to reach the Alan server.

domenica 20 febbraio 2022

Alan c2 Framework v6.0: Alan + JavaScript = ♡


Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework/releases/latest
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

Alan v6.0 was release with a new cool feature: JavaScript execution. The scripts are executed in memory and do not depend on any third party program. The scripts source code can be downloaded from the GitHub Alan repository.

Being able to extend the framework is a mandatory feature in today red-team tools. Each team has its own methodology to perform a red-team activity and being able to customize or extend the tool capabilities is mandatory. One of the main goal with Alan was to provide a framework that can be easily adapted to vairous modus-operandi. Alan v6.0 adds a new feature to support an easy extension, it allows the operator to execute JavaScript file directly in memory. This feature is implemented inside an Alan core module and does not depend on any third party program.

In other tools, this kind of feature requires the operator to compile C code by following a specific process. This might be overhelming and unecessary complex. Javascript is an easy language and even novices can become proficient in a short time.

However, being able to execute JavaScript code is not enough, since in most cases the operator needs to interact with native Windows function to perform a given action. Alan provides an interface to call native Windows functions by using the handy JavaScript syntax. This blog post explores the details of this feature and how to use it to extend the Alan capabilities.

Gettin Started

Executing a JavaScript file in Alan is extremyl easy, just use the run command and specify a file with .js extension. In order to call a Windwos function, Alan implements the Win32 module that exposes two methods: GetProcAddress and LoadLibrary. These are the basic methods to call virtually any Windows functions. Let's try to write a simple file that prints the process ID.

import * as win32 from 'Win32';

var kernel32 = win32.LoadLibrary("kernel32.dll");
var GetCurrentProcessId = win32.GetProcAddress(kernel32, "GetCurrentProcessId");
var IsWow64Process = win32.GetProcAddress(kernel32, "IsWow64Process");
var GetCurrentProcess = win32.GetProcAddress(kernel32, "GetCurrentProcess");


var my_pid = GetCurrentProcessId();
var is_wow64 = new Array(4);
IsWow64Process(GetCurrentProcess(), is_wow64);

var msg = "Hello world from Javascript executed in process: " + my_pid;
if (is_wow64[0] == 1)
	msg += " - I'm running under Wow64 :)";
print(msg);
The script opens the Win32 module in order to load the Kernel32 DLL by calling the LoadLibrary function. Using the obtained handle, the GetCurrentProcessId function address is resolved by using the GetProcAddress function. The other functions are resolved in the same way. You can now use the resolved functions by calling them as standard JavaScript functions. As final step, the script prints a string showing a message containing information extracted from the Windows APIs.

A fundamental step of the entire process is being able to easily test the script during the development stage. In this new Alan version, a new folder named tools was added to the Alan package. It contains the files cqjsx86.exe and cqjsx64.exe. These files are JavaScript interpreters in x86 and x64 version. Let's try to run our script with both files to see what result is produced (the --file option is used to specify the file path).

C:\Alan.v6.0.511.24\tools>cqjsx64.exe --file test.js
Hello world from Javascript executed in process: 15532

C:\Alan.v6.0.511.24\tools>
If we use the wqjsx86.exe program, we obtain the following result (I'm running my test in a x64 OS).
C:\Alan.v6.0.511.24\tools>cqjsx86.exe --file test.js
Hello world from Javascript executed in process: 30844 - I'm running under Wow64 :)

C:\Alan.v6.0.511.24\tools>
As can be noticed, the result is different according to the used version.
Once that the script works as expected, we can run it in the Alan agent by simply using the run command and specifying the full path of the script.

Windows API Data Structure Interoperation

The GetProcAddress and the LoadLibrary should provide the basic functionality to call every Wind32 APIs. However, interacting with a native API might requires further information. A typical example are parameters that are used as buffer (both in input and output). When this is the case, the following rules apply:
  • Each JavaScript Array is considered as an array of bytes when passed to a Win32 function. Each byte is casted to uint8_t (this causes a data truncation and a potential data corruption). If the array contains other complex data types (such as a String) its value is converted to NULL.
  • Boolean values are converted to 1 if true and 0 if false.
  • Each number is converted to a 32-bit interger on x86 process, and to 64-bit integer on x64 process.
  • Each JavaScript String is converted to an ascii string when passed to a Win32 function.
  • You can not call functions with more than 20 parameters.


The rules above imply that:
  • Each parameter passed by address to a Win32 function needs to be converted to an array (eg. to pass a LPDWORD you have to create an Array(4) parater if running in 32-bit or an Array(8) if running in 64-bit).
  • If a Win32 function accept a structure, it needs to be converted to an Array too. For example, a PROCESSENTRY32 structure must be represented as an Array and then parsed by refercing the fields by their offset (an example using this structure is presented later with some helper function to simplify the job).


All these rules might be quite annoying during the development of a not trivial script. In the next section I'll show how to easier the development task by implementing an lsass process memory dumper.

Implementing a simple lsass.exe process memory dumper

This is a perfect case to explore more in-depth this new feature. Being able to dump the process memory of the lsass process is very import to further compromise an host. There are various techniques to achieve this goal, but for the sake of simplicity I'll go for the simpler one, by using the MiniDumpWriteDump function. I'll put the script on GitHub so you can have a look at its full source code.

Let's suppose that our Agent is running as Administrator, then the following points have to be considered to write the dumper:
  • Enable SE_DEBUG_NAME privilege.
  • Scan all processes to identify the lsass.exe process.
  • Create a mini dump of the lsass.exe process.


As first step we have to load all the needed functions. This is a trivial task and already demonstrated in the previous example. Enabling SE_DEBUG_NAME is the next step. To perform this action we have to use a TOKEN_PRIVILEGES structure. This structure is quite simple, so for this task we will just create an array of 0x10 bytes and reference the sTP.Privileges[0].Luid, the sTP.PrivilegeCount and the sTP.Privileges[0].Attributes by their array offset. After calling the AdjustTokenPrivileges function we are now reayd to proceed with the next and probably most complex step.

We have to identify the lsass.exe process. To achieve this goal we use the CreateToolhelp32Snapshot function to obtain a snapshot and loop through all processes untile we find a process whose name is lsass.exe. This implies the usage of a PROCESSENTRY32 structure which is not that simple. To easies the task I created various JavaScript functions helper that serialize an object to a JavaScript array. The serialization function inspects the prefix of each field name and according to its value a specific serialization action is performed. For example, field names that start dw_ are serializated as DWORD. Field names that start with p_ are serializated to a four bytes array or eigth bytes array according to the value of a global variable that I defined at the start of the script (this step can be more dynamic by using the IsWow64Process function). Thanks to these functions, working with structures is now a lot easier (see the script source code for full details).

The final step is to create a file and call the MiniDumpWriteDump function to create a file dump that you can now download to your machine for post-processing.

Demo

Now that we have create our script to dump the lsass.exe process memory, let's use it. The video below provides a demonstration about how to dump the lsass.exe process memory by running our JavaScript script in the agent.

giovedì 20 gennaio 2022

Analyzing an IDA Pro anti-decompilation code


Twitter: @s4tan
GitHub: https://github.com/enkomio/

In this post I'll analyze a piece of code that induces IDA Pro to decompile the assembly in a wrong way. I'll propose a fix, but I'm open to more elegant solutions :)

The function that we want to decompile has the following assembly code (I'm using IDA Pro v7.6):

.text:1001BC95 56                  push    esi
.text:1001BC96 FF 74 24 10         push    [esp+4+arg_8]     
.text:1001BC9A 8B 74 24 10         mov     esi, [esp+8+arg_4] 
.text:1001BC9E 56                  push    esi
.text:1001BC9F FF 74 24 10         push    [esp+0Ch+arg_0]
.text:1001BCA3 52                  push    edx
.text:1001BCA4 51                  push    ecx
.text:1001BCA5 E8 57 20 FF FF      call    nullsub_1
.text:1001BCAA 8B 0A               mov     ecx, [edx]      
.text:1001BCAC 83 C4 14            add     esp, 14h
.text:1001BCAF 89 4E 0C            mov     [esi+0Ch], ecx
.text:1001BCB2 8B 42 04            mov     eax, [edx+4]
.text:1001BCB5 03 C1               add     eax, ecx
.text:1001BCB7 89 46 04            mov     [esi+4], eax
.text:1001BCBA 5E                  pop     esi
.text:1001BCBB C3                  retn


The function uses two arguments with an unconventional calling convention. If we decompile the code, we obtain:

int __cdecl sub_1001BC95(int a1, int a2)
{
  int *v2; // edx
  int v3; // ecx
  int result; // eax

  nullsub_1();
  v3 = *v2;
  *(a2 + 12) = *v2;
  result = v3 + v2[1];
  *(a2 + 4) = result;
  return result;
}
In IDA Pro the v2 variable (corrisponding to the line at address 0x1001BCAA) is colored in red, since its value might be undefined.

Custom calling convention might cause some problems to the decompilation process (see this), but, in general, there exist an easy fix to it: it is enough to inform IDA Pro that the function uses a custom calling convention. By modifying the function, we can set the new type with the following definition:

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
with this new definition, the decompiled code now looks like the following:
int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
{
  int *v1; // edx
  int v2; // ecx
  int result; // eax
  int v4; // [esp+Ch] [ebp+8h]

  nullsub_1();
  v2 = *v1;
  *(v4 + 12) = *v1;
  result = v2 + v1[1];
  *(v4 + 4) = result;
  return result;
}
We haven't done any progress at all. The only place where we haven't checked is the nullsub_1 function, the problem must be in its call. If we analyze this function, we notice that it has an empty body, as shown below.

.text:1000DD01 C3                  retn
Why is this function causing problems? The answer is in the software convention used by the compiler. During the compilation, the compiler considers some registers as volatile. This means that the value of these registers, after a function call, should not be considered preserved ([1]). Among the volatile registers, there is EDX, which is exactly one of the registers used to pass a function parameter in the custom calling convention.

This code causes problem to the decompilation process that considers (correctly) the EDX register to have an undefined value after the function call.

I'm not aware of any particular IDA Pro command to inform the decompiler to not consider EDX as volatile, so the simpler solution that I found is to just remove the call instruction (I patched the bytes E8 57 20 FF FF with 90 90 90 90 90). The result is a much cleaner decompiled code, as shown below.

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
{
  PUCHAR v3; // ecx
  int result; // eax
  
  v3 = *arg0;
  *(arg1 + 3) = *arg0;
  result = &arg0[1][v3];
  *(arg1 + 1) = result;
  return result;
}
Now we can proceed to further improve the decompilation code (we can clearly see the usage of a struct in the code) now that the decompiled code represents the real intent of the assembly code.

Update:

I received a message on twitter and reddit that suggests to have a look at the __spoils keyword mentioned in this Igor’s tip of the week post [2] (shame on me for not having found it).

Its meaning is exactly what we need to solve the problem in a more elegant and generic way. It is enough to change the nullsub_1 function definition by adding the __spoils keyword, as show below:

void __spoils<> nullsub_1(void)
The decompilation result of the function sub_1001BC95 is the same as before with the exception that the call to the nullsub_1 function is still there (it is not necessary to patch the bytes anymore).

Links:

[1] Register volatility and preservation
[2] Igor’s tip of the week #51: Custom calling conventions