sabato 6 giugno 2026

shrun, apiwatcher, and argus: three malware analysis tools built with Claude

It has been a while since my last blog post. In this one I'll present my experiment with using AI-assisted coding to build three Rust-based tools that ease my malware analysis workflow. I built them by using a Claude Pro account.

The Malware: Lorem Ipsum

I'll present three tools I built with Claude to analyze the Lorem Ipsum malware. I'm not going into detail about the malware functionality, since it was already covered in the BlueVoyant blog post.

The malware, especially in its latest version, is extremely obfuscated, making static analysis very hard: since IDA is not able to decompile the code, you have to work at the assembly level. I'll show how I was able to extract interesting IOCs from it by using a few dynamic analysis tools.

You can download the sample I used for this blog post from MalwareBazaar. In order to run it, you need the Node.js framework. Once the JavaScript code executes, the malware binary is installed in the C:\ProgramData\Microsoft Edge Updates Helper qZWpLKQXEGaa directory. To analyze it you have to kill the process and then restart it with the tools described below.

Disclaimer: My tools are very similar to other well-known alternatives. I chose to build them because I wanted to experiment with AI-assisted coding while also implementing features that were missing but that I consider useful for my workflow.

Shellcode Analysis — shrun

The first tool I built is shrun, a shellcode-to-executable converter that wraps a raw buffer into a PE file that can be loaded in the debugger. There are many such tools, but I found them to not work properly or to be overly complicated. With shrun I was able to create a PE file with a .text section that is RWX, and that accepts the shellcode allocation address as the first parameter via rcx — a common requirement in the shellcodes I analyze. To create a PE file I run the following command:

C:\Users\User\Desktop>shrun.exe
Usage: shrun.exe <shellcode.bin | hex_string> [32|64]

C:\Users\User\Desktop>shrun.exe shellcode.bin 64
[*] input:   \\?\C:\Users\User\Desktop\shellcode.bin
[*] mode:    PE64 (64-bit)
[*] payload: 147150 bytes
[*] output:  \\?\C:\Users\User\Desktop\shellcode_sh.exe
[*] shellcode: 0x0000000180001000  (= BASE_ADDRESS -> rcx)
[+] done  -  entry/stub: 0x0000000180024ece

C:\Users\User\Desktop>

Now I can use x64dbg to run my shellcode or analyze it in IDA (you can analyze the shellcode in IDA too, but good luck applying FLIRT signatures 😛).


API Tracing — apiwatcher

There are plenty of programs that do API tracing, such as tiny_tracer or API Monitor. These tools are very useful when the malware is heavily obfuscated and data-flow analysis is the quickest way to understand what it does. The feature I wanted was a CLI program that provides extensive information on the called APIs. So I decided to implement a tool that parses C header files in order to reconstruct each API's function signature. This way, during tracing, parameter names and their values can be captured with additional context. For this reason I built apiwatcher.

It supports many features, such as functions to exclude, functions to include, tracing targets, and so on. For our case, I know that the malware logic is implemented in msvcp140.dll, so I'll trace only calls originating from that DLL, filtering to IAT imports and dynamically resolved functions. Below you can see an example of output:

C:\Users\User\Desktop\apiwatcher-1.0.0>apiwatcher.exe --trace-iat --dll msvcp140.dll -- "c:\ProgramData\Microsoft Edge Updates Helper qZWpLKQXEGaa\Microsoft Edge Updates Helper.exe"

  __ _ _ __ (_)__      __  __ _  _      ___  _       ___  _ __
 / _` | '_ \(_)\ \ /\ / / / _` || |_   / __|| |_    / _ \| '__|
| (_| || |_) || | \ V  V / | (_| || __|  (__| '_ \|  __/| |
 \__,_|| .__/ |_|  \_/\_/   \__,_||_|   \___|_| |_| \___|_|
        |_|                                    |_| |_|
 v1.0.0  |  Windows API call tracer

[*] Using default exclusion file 'exclusions.txt'
[*] 141 exclusion pattern(s) loaded
[*] Using default inclusion file 'inclusions.txt'
[*] 7 inclusion pattern(s) loaded (override exclusions)
[defs] fileapi.h:           92 function(s),   5 typedef(s),  31 macro(s) expanded
[defs] libloaderapi.h:      32 function(s),  10 typedef(s),  15 macro(s) expanded
[defs] memoryapi.h:         58 function(s),   7 typedef(s),   7 macro(s) expanded
[defs] ntifs.h:            709 function(s), 617 typedef(s), 276 macro(s) expanded
[defs] processthreadsapi.h: 82 function(s),  15 typedef(s),   3 macro(s) expanded
[defs] string.h:            40 function(s),   0 typedef(s),   1 macro(s) expanded
[defs] synchapi.h:          61 function(s),  10 typedef(s),  22 macro(s) expanded
[defs] winhttp.h:           40 function(s),  28 typedef(s),  33 macro(s) expanded
[defs] WinInet.h:          171 function(s),  79 typedef(s), 125 macro(s) expanded
[defs] winsock.h:           44 function(s),  33 typedef(s),  78 macro(s) expanded
[defs] winsock2.h:         105 function(s), 171 typedef(s), 133 macro(s) expanded
[*] Loaded 1394 function definition(s) from 'defs'
[*] IAT-trace mode: hooking EXE imports + GetProcAddress-resolved functions
[+] PID 6516 - 102 IAT hook(s) from Microsoft Edge Updates Helper.exe
[+] Attached to PID 6516 - waiting for initial breakpoint...
[+] PID 6516 - hooking GetProcAddress @ 0x7ff89565b1d0 (kernel32.dll)
[+] PID 6516 - hooking GetProcAddress @ 0x7ff89462abc0 (KernelBase.dll)
[+] GetProcAddress(ntdll.dll.RtlDisownModuleHeapAllocation) -> 0x7ff896ecfa30 - hooking
[+] GetProcAddress(KernelBase.dll.InitializeCriticalSectionEx) -> 0x7ff89465be60 - hooking
[+] GetProcAddress(KernelBase.dll.FlsAlloc) -> 0x7ff89466ac60 - hooking
...
[+] GetProcAddress(winhttp.dll.WinHttpSetOption) -> 0x7ff88d2f7ce0 - hooking
[+] GetProcAddress(winhttp.dll.WinHttpSetTimeouts) -> 0x7ff88d2f8420 - hooking

By default the trace is saved in the CSV file apiwatcher.csv. Below you can see an extract of the generated file:

timestamp,pid,tid,retaddr,caller_image,bp_addr,target_image,target_routine,params,retval
...
1780735593.201378,6516,7520,0x7ff87d6e2d68,msvcp140.dll,0x7ff89565b1d0,kernel32.dll,GetProcAddress,hModule=0x00007ff896e50000 lpProcName=0x00007ff87d6e2d59:"strcmp",0x00007ff896ee0ed0
1780735594.747193,6516,7520,0x7ff87d6e2dc5,msvcp140.dll,0x7ff89565b1d0,kernel32.dll,GetProcAddress,hModule=0x00007ff896e50000 lpProcName=0x00007ff87d6e2db6:"strlen",0x00007ff896ee1050
1780735596.315369,6516,7520,0x7ff87d6e2e61,msvcp140.dll,0x7ff89565b1d0,kernel32.dll,GetProcAddress,hModule=0x00007ff895640000 lpProcName=0x00007ff87d6e2e4c:"LoadLibraryA",0x00007ff895660800
1780735597.852891,6516,7520,0x7ff87d6e2faa,msvcp140.dll,0x7ff896ee0ed0,ntdll.dll,strcmp,_Str1=0x00007ff87d71f0bc:"gentle" _Str2=0x00007ff87d755132:"though42",0xffffffff
1780735597.853478,6516,7520,0x7ff87d6e2faa,msvcp140.dll,0x7ff896ee0ed0,ntdll.dll,strcmp,_Str1=0x00007ff87d71f0c3:"hush" _Str2=0x00007ff87d755132:"though42",0xffffffff
...
1780738174.437364,6516,7520,0x7ff87d7563f0,msvcp140.dll,0x7ff895664c80,kernel32.dll,CreateMutexA,lpMutexAttributes=0x0000000000000000 bInitialOwner=0x00 lpName=0x00007ff87d75a524:"bf428ad4-cb18-44b1-87f7-7047da02c592",0x00000170
1780738174.438104,6516,7520,0x7ff87d756481,msvcp140.dll,0x7ff896ed1750,kernel32.dll,AddVectoredExceptionHandler,arg0=0x0000000000000001 arg1=0x00007ff87d757132 arg2=0x00000000ffffffff arg3=0x0000000000000001,0x0000021a94e3af50
...
1780738185.621205,6516,7520,0x7ff87d758822,msvcp140.dll,0x7ff895664e60,kernel32.dll,CreateFileA,lpFileName=0x00007ff87d75a525:"f428ad4-cb18-44b1-87f7-7047da02c592" dwDesiredAccess=0x80000000 dwShareMode=0x00000001 ...,0x00000054
1780738185.687583,6516,7520,0x7ff87d758898,msvcp140.dll,0x7ff895665090,kernel32.dll,GetFileSize,hFile=0x00000054 lpFileSizeHigh=0x0000000000000000,0x00014aab
1780738185.688160,6516,7520,0x7ff87d7589b0,msvcp140.dll,0x7ff8956651f0,kernel32.dll,ReadFile,hFile=0x00000054 lpBuffer=0x0000021a96a10080 nNumberOfBytesToRead=0x00014aab ...,0x01
1780738185.824290,6516,7520,0x7ff87d7589d9,msvcp140.dll,0x7ff895664bf0,kernel32.dll,CloseHandle,arg0=0x0000000000000054 ...,0x0000000000000001
...
1780740347.222177,6516,7520,0x7ff87d757ca4,msvcp140.dll,0x7ff88d2f1e20,winhttp.dll,WinHttpOpen,pszAgentW=L"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:143.0) Gecko/201001" dwAccessType=0x00000000 ...,0x0000021a94e50120
1780740347.224994,6516,7520,0x7ff87d757d48,msvcp140.dll,0x7ff88d2dfac0,winhttp.dll,WinHttpConnect,pswzServerName=L"loginrestforest.com" nServerPort=0x01bb ...,0x0000021a94e5bbf0
1780740347.227423,6516,7520,0x7ff87d757e26,msvcp140.dll,0x7ff88d2fa2d0,winhttp.dll,WinHttpOpenRequest,pwszVerb=L"POST" pwszObjectName=L"/api/init/bf428ad4-cb18-44b1-87f7-7047da02c592" ...,0x0000021a94e776e0
1780740347.231475,6516,7520,0x7ff87d757ef4,msvcp140.dll,0x7ff88d2f8950,winhttp.dll,WinHttpAddRequestHeaders,lpszHeaders=L"Content-Type: image/jpeg" ...,0x01
1780740347.232000,6516,7520,0x7ff87d757f79,msvcp140.dll,0x7ff88d2f9040,winhttp.dll,WinHttpSendRequest,...,0x00
1780740369.157199,6516,7520,0x7ff87d758075,msvcp140.dll,0x7ff88d2f9930,winhttp.dll,WinHttpReceiveResponse,...,0x00
...

Since apiwatcher supports dropping arbitrary .h files in the defs directory to enhance the output, from the trace you can extract very useful information: the mutex name, the C2 hostname (loginrestforest[.]com), the API endpoint pattern (/api/init/<UUID>), and the fake Content-Type: image/jpeg header.


Network Traffic Monitoring — argus

The last tool I want to describe is very similar to FakeNet, but with enhanced capabilities: argus. I decided to implement this tool to ease malware traffic analysis. I wanted something that logged all requests and responses, including HTTPS, with the ability to modify them on the fly. Argus uses the same WinDivert-based interception mechanism as FakeNet but also supports forwarding to the real server.

Its configuration is quite extensive, so please refer to the official documentation for a complete description. In order to effectively use argus, you have to install in your sandbox the certificate used to intercept HTTPS traffic. A default certificate is included in the package, but if you delete it, argus will create a new one when started. Below you can see a few screenshots showing how to install the certificate in your Trusted Root store.

In my case I want to forward the requests to the C2 server in order to capture both requests and responses. I modify configs/default.ini and add to the [HTTPSListener] section the following line:

Passthrough: *

This instructs argus to forward all requests to the real server. Then I run argus followed by the malware:

PS C:\Users\User\Desktop\argus-1.1.0> .\argus.exe

    _    ____   ____  _   _ ____
   / \  |  _ \ / ___|| | | / ___|
  / _ \ | |_) || |  _| | | \___ \
 / ___ \|  _ < | |_| | |_| |___) |
/_/   \_\_| \_\ \____|\___/ |____/

  Network traffic interception tool for malware analysis  |  v1.1.0

2026-06-06 03:26:24  INFO  Loading configuration from: configs/default.ini
2026-06-06 03:26:24  INFO  Starting Argus on 0.0.0.0
2026-06-06 03:26:24  INFO  Request logging -> capture
2026-06-06 03:26:24  INFO  Starting all listeners...

Starting listeners:
  * HTTPListener  on 0.0.0.0:18080 (intercepts :80)  [TCP]
  * HTTPSListener on 0.0.0.0:18443 (intercepts :443) [TCP/SSL]
[Argus HTTPS] Using CA certificate from configs\argus-ca.crt
2026-06-06 03:26:24  INFO  All listeners started (2 total)
2026-06-06 03:26:24  INFO  Diverter: active - full bidirectional NAT enabled

Argus is running. Press Ctrl+C to stop.

2026-06-06 03:26:45  INFO  Diverter: [SearchApp.exe (PID 8176)] 192.168.92.130:50266 -> 2.22.248.140:443 intercepted
2026-06-06 03:26:47  INFO  Diverter: [svchost.exe (PID 3388)] 192.168.92.130:50283 -> 23.206.246.162:80 intercepted
2026-06-06 03:26:47  INFO  Diverter: [svchost.exe (PID 3388)] 192.168.92.130:50282 -> 184.25.52.64:443 intercepted
2026-06-06 03:26:52  INFO  Diverter: [Microsoft Edge Updates Helper.exe (PID 1196)] 192.168.92.130:50304 -> 92.118.126.178:443 intercepted
[HTTPSListener] 127.0.0.1:50304 -> 92.118.126.178:443  PASSTHROUGH  [Microsoft Edge Updates Helper.exe (PID 1196)]
...
2026-06-06 03:27:13  INFO  Diverter: [Microsoft Edge Updates Helper.exe (PID 1196)] 192.168.92.130:50320 -> 146.19.49.91:443 intercepted
[HTTPSListener] 127.0.0.1:50320 -> 146.19.49.91:443   PASSTHROUGH  [Microsoft Edge Updates Helper.exe (PID 1196)]
2026-06-06 03:27:34  INFO  Diverter: [Microsoft Edge Updates Helper.exe (PID 1196)] 192.168.92.130:50326 -> 144.217.220.118:443 intercepted
[HTTPSListener] 127.0.0.1:50326 -> 144.217.220.118:443 PASSTHROUGH  [Microsoft Edge Updates Helper.exe (PID 1196)]
...

All traffic is captured in the capture folder, following the pattern capture/<PROCESS NAME>/<PID>/<HANDLER NAME>/<LOG FILE>.log. From the captured data I can see that the malware starts communication with a few C2 servers using the URL pattern /api/init/<UUID>, followed by a request to another server likely used as a dead-drop, before cycling back to the same pattern. This analysis is consistent with the apiwatcher trace.


Conclusion

My experience with AI-assisted coding can be summarized in a few points:

  • Domain expertise is still essential. You need to guide Claude during development — it won't figure out your specific requirements on its own.
  • Resist feature creep. LLMs tend to be very verbose and to suggest a lot of cool features. Resist saying yes, as they tend to overestimate their usefulness 🙂
  • Say goodbye to code paternity. As mrexodia pointed out, you lose ownership in a way that's hard to describe. You care about the code, but it's not entirely yours anymore 🙂

lunedì 2 gennaio 2023

The Segmented Memory Model and How It Works in Windows x64

I created this post as part of my jouring in getting more acquainted with the Intel architecture. Segmentation is a very important topic in the Intel architecture, so here is my contribution. For my experiment I'll use a x64 Windows 10 running in a VM attached to a kernel debugger.

Mode of Operations

The first step is to identify the processor mode of operation. x64 supports various modes and memory models. Let's try to identify the current one. This information is stored in the 32-bit CR0 control register ([1]), under the flag PE stored at position 0 (position 0 is the least significant bit (LSB), that is, the right-most bit). If this bit is set, we are running in protected mode, otherwise we are running in real-address mode. Let's use the kernel debugger to perform this check as shown in Figure 1.

kd> .formats cr0
Evaluate expression:
  Hex:     00000000`80050031
  Decimal: 2147811377
  Decimal (unsigned) : 2147811377
  Octal:   0000000000020001200061
  Binary:  00000000 00000000 00000000 00000000 10000000 00000101 00000000 00110001
  Chars:   .......1
  Time:    ***** Invalid
  Float:   low -4.59246e-040 high 0
  Double:  1.06116e-314
Figure 1. Operation Mode Identification
The CR0.PE bit is set to 1, so we are running in protected mode using a segmented memory model (you might also notice that the CR0.PG bit, at position 31 is set, indicating that we are also using paging). We can also check the sub-mode operation by inspecting the IA32_EFER Machine Specific Register (MSR) (0xC0000080) ([2]), and checking the LME (bit position 8) and LMA (bit position 10) flags. You can see the result in Figure 2.

kd> rdmsr 0xC0000080
msr[c0000080] = 00000000`00000d01
kd> .formats 00000000`00000d01
Evaluate expression:
  Hex:     00000000`00000d01
  Decimal: 3329
  Decimal (unsigned) : 3329
  Octal:   0000000000000000006401
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00001101 00000001
  Chars:   ........
  Time:    Thu Jan  1 01:55:29 1970
  Float:   low 4.66492e-042 high 0
  Double:  1.64474e-320
Figure 2. Operation Sub-Mode Identification
The IA32_EFER.LMA and IA32_EFER.LME bits are set, so we are running in IA-32e sub-mode (64-bit). This information will be used later in the text.

Segmented Memory Model

The Segmented Memory Model accesses the memory by using the segment concept. A segment provides information on how to translate a given address. According to the executed instruction, a different segment is involved (eg. for call instruction the code segment is used, instead, for the push and pop instructions the stack segment is used). The Intel architecture defines a total of six segment registers: CS, DS, ES, SS, GS, and FS. For example, the CS segment (code segment) is used when a call instruction is executed. Let's see how this works with a practical example, let's consider the instruction in Figure 3.

00007FFD42C7D5C1 | E8 1A000000  | call kernelbase.7FFD42C7D5E0
Figure 3. How Segmentation Works
The call instruction uses the value 1A000000 to specify the address of the function to execute. Since we are in a x64 bit operation mode, the value is RIP-relative, this explains why the function address in the disassembly is 0x7FFD42C7D5E0 (0x7FFD42C7D5C1 (RIP) + 0x1a (offset) + 0x05 (instruction size)). In addition to the mentioned value, the value of the CS segment is also used. The combination of the CS with the function address is called the logical address. The segment value is then used to translate the logical address into what is known as the virtual address (this process is described in the next section). Since our system is using paging, and additional translation step is performed to translate the virtual address into the physical address (this topic is not covered in this post). All the translation steps are represented in Figure 4.
Figure 4. Logical to Physical Address Translation


How Segmentation Works

The segment registers are 16-bit registers whose structure is reported in Figure 5.
Figure 5. Segment Selector Format


The Index field is used as an index in a table that contains information on all the available segments. The TI flag indicates which table must be used, and the Request Privilege Level (RPL) field specifies the protection level of the code requesting access to a specific segment. The possible protection level values are: 0, 1, 2 and 3, and are often represented as protection rings, where ring 0 is the most privileged (where the kernel mode code is executed) and ring 3 is the least privileged (where user mode code is executed).

The two tables that contain information on the segments are the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT). The registers GDTR and LDTR contain the base address of the respective table. In the latest Windows versions, the LDT is no more used, so the TI flag will always be 0. The GDT is an array of segment descriptors, where each segment descriptor is typically represented by the 64-bit structure reported in Figure 6.
Figure 6. Segment Descriptor Format


Given the segment descriptor definition, we can now explain how the logical address to virtual address translation is performed. The Base field is added to the logical address in order to obtain the virtual address. This process is described in Figure 7.
Figure 7. Segment Descriptor Usage in Address Translation


A very important field is DPL. It indicates the privilege level of the code running in that segment, for example, a DPL value of 0 can execute privileged instructions such as CLI. Another relevant field is L. This field indicates if the segment is running in Long mode (if it is set to 1) or in compatibility mode (if it is set to 0). Figure 8 shows how to inspect the GDT and all the defined segments.
kd> rgdtr
gdtr=fffff804382f3fb0
kd> db fffff804382f3fb0 
fffff804`382f3fb0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
fffff804`382f3fc0  00 00 00 00 00 9b 20 00-00 00 00 00 00 93 40 00  ...... .......@.
fffff804`382f3fd0  ff ff 00 00 00 fb cf 00-ff ff 00 00 00 f3 cf 00  ................
fffff804`382f3fe0  00 00 00 00 00 fb 20 00-00 00 00 00 00 00 00 00  ...... .........
fffff804`382f3ff0  67 00 00 20 2f 8b 00 38-04 f8 ff ff 00 00 00 00  g.. /..8........
fffff804`382f4000  00 3c 00 00 00 f3 40 00-00 00 00 00 00 00 00 00  .<....@.........
fffff804`382f4010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
fffff804`382f4020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
kd> dg 10 50
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
0018 00000000`00000000 00000000`00000000 Data RW Ac 0 Bg By P  Nl 00000493
0020 00000000`00000000 00000000`ffffffff Code RE Ac 3 Bg Pg P  Nl 00000cfb
0028 00000000`00000000 00000000`ffffffff Data RW Ac 3 Bg Pg P  Nl 00000cf3
0030 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb
0038 00000000`00000000 00000000`00000000  0 Nb By Np Nl 00000000
0040 00000000`382f2000 00000000`00000067 TSS32 Busy 0 Nb By P  Nl 0000008b
0048 00000000`0000ffff 00000000`0000f804  0 Nb By Np Nl 00000000
0050 00000000`00000000 00000000`00003c00 Data RW Ac 3 Bg By P  Nl 000004f3
Figure 8. Dumping All Segments
The first two commands obtain the address of the GDT register and dump the memory value. The first non null entry is at offset 0x10 from the GDT base address (the first entry in the GDT is always null). To have a more readable view, we can use the dg command; it dumps all the segments and shows relevant information. There are various Code and Data segments, having as privilege 0 (kernel mode) and 3 (user mode).

In particular, there is a segment in user mode that is running in 32-bit compatibility mode (Long=0); its segment selector is 0x20. Similarly, there is a segment running in user mode as long mode (Long=0); its segment selector is 0x30.

Windows and the Flat Memory Model

You might have heard that Windows uses a flat memory model, but, we stated above that we are running in a segment memory model. What does it mean? By now, you know how a segment descriptor is used to compute the virtual address and we have also dumped all the segment descriptors defined in the system. You might have noticed that all the Code and Data segments have the Base address field to 0. This implies that Windows is not taking advantage of the segment concept, since having as Base always 0 has as result that the logical address is equal to the virtual address. This means that we are using a segmented memory model without using the segment concept. This mode is known as flat memory model. This statement is also reported by the Intel official documentation:

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures. Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

Decoding a Segment Register

Let's try decoding the value stored in a segment register. Let's consider the CS register, having value 0x33. This value in binary format is 00110011b. As described in Figure 5, bits 3-15 represent the index in the GDT table, which in this case have decimal value 6 (110b). To obtain the segment selector we have to multiply the index by the size of a segment descriptor, which is 8 bytes. Figure 9 shows this operation in the kernel debugger.

kd> .formats 0x33
Evaluate expression:
  Hex:     00000000`00000033
  Decimal: 51
  Decimal (unsigned) : 51
  Octal:   0000000000000000000063
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00110011
  Chars:   .......3
  Time:    Thu Jan  1 01:00:51 1970
  Float:   low 7.14662e-044 high 0
  Double:  2.51973e-322
kd> dq gdtr + (6 * 8) L1
fffff804`382f3fe0  0020fb00`00000000
Figure 9. Obtain the Segment Selector
The segment descriptor value is 0020fb00`00000000. Now, let's use the dg and dt commands to display the segment descriptor associated with index 6, by using the operation 6 * 8 = 48 (0x30). The result is reported in Figure 10.

kd> dg 30
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0030 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb
kd> dt nt!_KGDTENTRY64 fffff804`382f3fe0 -b
   +0x000 LimitLow         : 0
   +0x002 BaseLow          : 0
   +0x004 Bytes            : 
      +0x000 BaseMiddle       : 0 ''
      +0x001 Flags1           : 0xfb ''
      +0x002 Flags2           : 0x20 ' '
      +0x003 BaseHigh         : 0 ''
   +0x004 Bits             : 
      +0x000 BaseMiddle       : 0y00000000 (0)
      +0x000 Type             : 0y11011 (0x1b)
      +0x000 Dpl              : 0y11
      +0x000 Present          : 0y1
      +0x000 LimitHigh        : 0y0000
      +0x000 System           : 0y0
      +0x000 LongMode         : 0y1
      +0x000 DefaultBig       : 0y0
      +0x000 Granularity      : 0y0
      +0x000 BaseHigh         : 0y00000000 (0)
   +0x008 BaseUpper        : 0
   +0x00c MustBeZero       : 0
   +0x000 DataLow          : 0n9283176673312768
   +0x008 DataHigh         : 0n0
Figure 10. Dump of a Segment Descriptor
As you can see, the result is the same in both cases.

Experimenting With Kernel Mode and User Mode Code

Let's use windbg to inspect the segments of a piece of code running in kernel mode (Figure 11).

kd> r
rax=0000000000000003 rbx=fffff804382fde60 rcx=fffff804382fde60
rdx=fffff804382fde10 rsi=fffff80433b731a0 rdi=fffff80433b73190
rip=fffff80435414be5 rsp=fffff804382fdde8 rbp=0000000000000000
 r8=0000000000000003  r9=fffff804382fddf8 r10=0000000000000000
r11=fffff804382fddd0 r12=fffff80433b73100 r13=0000000000000000
r14=0000000000000100 r15=00000000ffffffff
iopl=0         nv up di ng nz na po nc
cs=0010  ss=0000  ds=002b  es=002b  fs=0053  gs=002b             efl=00040086
nt!DebugService2+0x5:
fffff804`35414be5 cc              int     3
Figure 11. 64-bit Kernel Mode Process Registers
As you can see, RIP points to kernel address, and the CS segment value is 0x10 that, according to the result from Figure 8, corresponds to a segment of type Code, with privilege 0 (the most privileged) and Long mode enabled. Now let's try the same experiment by analyzing a 64-bit user-mode process (Figure 12).
Figure 12. 64-bit User Mode Process Registers


The image shows a CS segment value of 0x33, that corresponds to a segment of type Code, with privilege 3 (the lowest privilege) and Long mode enabled. Finally, let's see an example of a 32-bit user-mode process running on a 64-bit OS (Figure 13).
Figure 13. 32-bit User Mode Process Registers


The image shows a CS with value 0x23, that corresponds to a segment of type Code, with privilege 3 and Long mode disabled. Since Long mode is disabled, this implies that the process is running in compatibility-mode (32-bit).

Segment Transition and Syscall

We mentioned that code running in kernel mode has a different CS value with DPL value 0. How is the segment transition performed? There are various ways to change the segment descriptor. One way is by using specific instructions that change the CS register, such as retf, which reads the new CS value from the stack. However, due to a lower DPL we can not use such a mechanism.

An alternative method is to use a call gate segment descriptor ([3]). However, this mechanism is not used in modern Windows OS, which prefers to use the syscall instruction. Among the various actions performed by this instruction, there is the change of the segment selector. But, how is the correct segment chosen? This information is obtained from the IA32_STAR (0xC0000081) MSR. Bit 32-47 are extracted and used as value for the new segment selector (which is 0x10 in case of transition to kernel mode). Let's use windbg to verify this aspect (Figure 14).

kd> rdmsr 0xC0000081
msr[c0000081] = 00230010`00000000
kd> .formats 00230010`00000000
Evaluate expression:
  Hex:     00230010`00000000
  Decimal: 9851692904349696
  Decimal (unsigned) : 9851692904349696
  Octal:   0000430001000000000000
  Binary:  00000000 00100011 00000000 00010000 00000000 00000000 00000000 00000000
  Chars:   .#......
  Time:    Sun Mar 21 11:08:10.434 1632 (UTC + 1:00)
  Float:   low 0 high 3.21426e-039
  Double:  5.28462e-308
kd> .formats 0y0000000000010000
Evaluate expression:
  Hex:     00000000`00000010
  Decimal: 16
  Decimal (unsigned) : 16
  Octal:   0000000000000000000020
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010000
  Chars:   ........
  Time:    Thu Jan  1 01:00:16 1970
  Float:   low 2.24208e-044 high 0
  Double:  7.90505e-323
kd> dg 10
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
Figure 14. Transition to DPL 0 Via Syscall Instruction
We first read the IA32_STAR MSR and extract the bits related to the new CS, whose value is 00000000 00010000. Converting this value to hex results in 0x10, which is exactly the same value that we obtained when we inspected the CS register in kernel mode in the previous section.

Heaven's Gates Consideration

If you reached this point, you now have all the information to understand the concept behind the Heaven's Gate mechanism, which is used to transition from x64 to x86 code in order to run 32-bit binaries. Microsoft created a specific segment descriptor for this purpose, assigning to it the value 0x20. The privileges between the two segment descriptors are the same, and it is possible to perform the transition by using one of the many instructions that take into consideration the CS register, such as retf or a far call. A lot of documentation is written on this aspect, and Microsoft refers to this with the name Windows-on-Windows (WoW64).

Conclusion

Modern OS are executed in protected mode under a flat segmented memory model. In this post we analyzed how this model works and how it can be used to change privilege levels. If you want to know more, I invite you to read the references.

References

[1] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A): System Programming Guide - Chapter 2.5 CONTROL REGISTERS
[2] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 4: Model-Specific Registers - IA32_EFER
[3] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A): System Programming Guide - Chapter 5.8.3 Call Gates
[4] - Call Gates' Ring Transitioning in IA-32 Mode
[5] - Bringing Call Gates Back
[6] - Windows Internals, Part 2, 7th Edition
[7] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1
[8] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture

domenica 26 giugno 2022

TheMatrix - A process inspection tool aimed at easier the malware analysis task

Twitter: @s4tan
Download: https://github.com/enkomio/thematrix

In this post I'll describe a project that I created to easier the malware analysis process. The goal of the project is to run a target binary in a controlled environment and logging the Win32 function calls. I wanted to create something that is easy to extend and robust. I'm aware that other similar tools exists, but my intent was to have fun in doing Assembly programming and learning stuff that I only reversed but never implemented :)

How it works

TheMatrix is a program mostly written in assembly (x86/x64) that implements the following features:
  • It creates a PE loader (also referenced as an activator) that loads a user input binary (also know as target binary).
  • A multi-arch hook engine that monitors the Win32 API function calls.

Create an activator

The first task consists in creating an activator. This is a binary that once executed loads the embedded PE file (the target binary) and runs its entry-point. The activator will be a DLL if the targety binary is a DLL or an EXE otherwise. The activator exports an additional function which is DllRegisterServer. This function is commonly used by malware to start the main code.

Activator execution

When executed, the activator extracts the embedded binary and loads it in memory. Before executing the target binary entry-point, various Win32 function hooks are placed. This ensures that the malware execution is monitored. By default, TheMatrix implements various Windows hooks that log the input data to the folder: ./Desktop/thematrix/<PID>/<API_name>.log. During the PE loading step, the PEB.Ldr field is updated by including the target binary. This field contains a double linked list of all the currently loaded DLL and it is used by various Win32 API such as GetProcAddress. I still wonder why of the many PE loader projects available online, no one modifies the Ldr field.

TheMatrix Under the hood

The core of TheMatrix is implemented in assembly. This gave me the possibility to improve my x64 assembly programming skills and at the same time to implement features that I only reveresed. The x86 and x64 version have quite a few differences which are detailed below.

x86 Version

The 32-bit version of TheMatrix uses Microsoft Hot Patching mechanism to place the function hooks (see file x86_hook_engine.inc). The inserted JMP instruction will jump to a trampoline (a concept described later) that is placed in a code cave. The code cave is found by searching in the DLL sections. At execution time, when the API function is called by the target binary, the trampoline will execute and a jump to the user defined hook function is performed.

x64 Version

I started to implement the project in x86 assembly. As soon as I finished the initial version, the malware that I was interested in analysing switched to x64. This forced me to re-implement all the code in x64 assembly too (here is my reaction when I discovered this fact: https://twitter.com/s4tan/status/1516488723294298116).

When I decided to implement the x64 version too, I find myself in trouble since the x64 Win32 APIs do not support hot patching in the same way as the x86 version. This forced me to choose a different approach to place my hooks. In the end, I decided to use Export Address Table (EAT) hooking. As for the x86 version, a trampoline is used that will call the user defined hook function (see file x64_hook_engine.inc).

An additional aspect that is often ignored during the binary reversing process, it is that MS uses a different x64 function call convention when compared to x86 code (see this doc for more details). In addition, the stack needs to be 16 bytes aligned. In theory the concept is simple, but as often happens, the evil is in the details :) Luckily I found a useful 300 loc file that help me with this task (see https://twitter.com/s4tan/status/1522150733839273986).

Trampoline and hook function

The trampoline contains part of the magic that allowed me to create a clean design. Below you can see the x64 version of the trampoline code before being written to the identified code cave.
@trampoline_code_start:
	mov rax, 011223344aabbccddh ; store the address of the original function
	mov qword ptr gs:[28h], rax ; TIB.ArbitraryUserPointer, see: https://codemachine.com/articles/arbitraryuserpointer_usage.html
	mov rax, 011223344aabbccddh ; hook function address
	jmp rax
Two places needs to be patched at runtime. The first is the address of the user defined function hook, and the second one is the original address of the hooked function. This latest information is necessary in order to easily call the original function as show in the section below. To store this value I choosed the TIB.ArbitraryUserPointer field which is part of the Thread Environment Block (or TIB in this case). This field is rarely used and is a good place to store our information. The only requirement is that the original function must be called in the same thread of the function hook.

Usage

As mentioned, the first step is to create the activator. This is achieved by using the -add command and specifying the target binary. TheMatrix will create a copy of itself containing the target binary. If the target binary is a DLL, TheMatrix will modify the activator file in order to result as a DLL and not as an EXE file. Once the activator is created, it can be executed in the same way as the target binary.

One of the main goal of my project was to create something that was really easy to update. Adding a new function hook must be a deadly easy operation. In the end I come up with a design where you can extend the project in a simple way, you just need a bit of Win32 API programming skill (you can implement your code in C, no Assembly programming required ^^). To place an hook you just need to use the hook_add function, by specifying the DLL name, the API function name and the user defined hook function. An example of call is the following one:

hook_add("Bcrypt.dll", "BCryptImportKeyPair", hook_BCryptImportKeyPair);
Then, you have to implements your function hook. To call the original function it is enough to use the call_original function by passing the input parameters of the original function. This kind of design is possible thanks to the freedom provided by programming in assembly. An example of usage is shown below.
LPVOID __stdcall hook_BCryptImportKeyPair(BCRYPT_ALG_HANDLE hAlgorithm, BCRYPT_KEY_HANDLE hImportKey, LPCWSTR pszBlobType, BCRYPT_KEY_HANDLE* phKey, PUCHAR pbInput, ULONG cbInput, ULONG dwFlags)
{
	// save imported key bytes
	char name[MAX_PATH] = { 0 };
	snprintf(name, sizeof(name), "BCryptImportKeyPair_%llx_%d", (uint64_t)pbInput, cbInput);
	log_data(cbInput, pbInput, name);

	LPVOID ret = call_original(
		hAlgorithm,
		hImportKey,
		pszBlobType,
		phKey,
		pbInput,
		cbInput,
		dwFlags
	);
	return ret;
}
In the example above, the hook function logs the imported key before calling the original function. The final step is to inform TheMatrix of the available hooks before to run the target binary. This action is performed in the function hooks_init, whose definition is the following:
bool hooks_init(uint8_t* hMod)
The file hooks.c contains the function call, and can be customized by the user.

Demo

The following video shows an example of TheMatrix usage. The video shows the execution of a malware and demonstrates how TheMatrix is able to trace the execution of a new process and the extraction of relevant information. The malware is a famous one and it is not difficult to recognize it if you are into malware analysis ;)

venerdì 20 maggio 2022

Alan c2 Framework v7.0: Hyper-Pivoting


Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework/releases/latest
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

A new Alan C2 Framework version was released, codename: Hyper-Pivoting. This new version includes some cool features like a proxy usage to allow the operator to easily pivoting through networks.

SOCKS5 Proxy

Network Pivoting is an essential part of every red-team activities and a must have feature for every C2 Framework. Alan v7.0 implements a proxy feature to easier network pivoting. By using the proxy command the operator can create a SOCKS5 compliant proxy on the machine where the agent is running, or interacting with an already running proxy.

Proxy chain is another useful feature that allows the operator to chain multiple proxies togheter. Creating a proxy chain is very simple, just use the command: proxy chain [proxy ID source] [proxy ID dest]. Some network segments can communicate only with specific addresses, this implies that reaching the C2 server is not an easy task. By using a chain of proxies the agent can establish a path to the Alan server and being able to compromise very segmented networks too.

The executed proxies are protected by a username and password. If the operator does no specify them, a randomly generated username and password is used (the operator can see the username and password by running the proxy command). As mentioned, the proxy are SOCKS5 proxies and can be used by any other programs that accept a SOCKS5 proxy.

One of the main Alan pillars is the in-memory execution of all its components, and the proxy has no exception. When a proxy is executed, its code runs inside the host process without touching the disk.

Misc features

Alan 7.0 includes other relevant features. The info command was improved by showing the Machine ID and if the agent is using a proxy. All Alan logs are now saved to the alan.log file. In addition, all the output generated by the Alan server and the commands inserted by the operator are saved to an evidence file. This allows the operator to include the evidence file as part of the red-team activity report.

Demo

The video below shows an example of proxy usage. After creating a proxy the Alan agent is instructed to use it. The video demonstrates that the running proxies are compliant to the SOCKS5 specification, by using one the created proxy with the curl utility. Next, a proxy chain is created and the network traffic displayed to show that the chain of proxies is traversed before to reach the Alan server.

domenica 20 febbraio 2022

Alan c2 Framework v6.0: Alan + JavaScript = ♡


Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework/releases/latest
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

Alan v6.0 was release with a new cool feature: JavaScript execution. The scripts are executed in memory and do not depend on any third party program. The scripts source code can be downloaded from the GitHub Alan repository.

Being able to extend the framework is a mandatory feature in today red-team tools. Each team has its own methodology to perform a red-team activity and being able to customize or extend the tool capabilities is mandatory. One of the main goal with Alan was to provide a framework that can be easily adapted to vairous modus-operandi. Alan v6.0 adds a new feature to support an easy extension, it allows the operator to execute JavaScript file directly in memory. This feature is implemented inside an Alan core module and does not depend on any third party program.

In other tools, this kind of feature requires the operator to compile C code by following a specific process. This might be overhelming and unecessary complex. Javascript is an easy language and even novices can become proficient in a short time.

However, being able to execute JavaScript code is not enough, since in most cases the operator needs to interact with native Windows function to perform a given action. Alan provides an interface to call native Windows functions by using the handy JavaScript syntax. This blog post explores the details of this feature and how to use it to extend the Alan capabilities.

Gettin Started

Executing a JavaScript file in Alan is extremyl easy, just use the run command and specify a file with .js extension. In order to call a Windwos function, Alan implements the Win32 module that exposes two methods: GetProcAddress and LoadLibrary. These are the basic methods to call virtually any Windows functions. Let's try to write a simple file that prints the process ID.

import * as win32 from 'Win32';

var kernel32 = win32.LoadLibrary("kernel32.dll");
var GetCurrentProcessId = win32.GetProcAddress(kernel32, "GetCurrentProcessId");
var IsWow64Process = win32.GetProcAddress(kernel32, "IsWow64Process");
var GetCurrentProcess = win32.GetProcAddress(kernel32, "GetCurrentProcess");


var my_pid = GetCurrentProcessId();
var is_wow64 = new Array(4);
IsWow64Process(GetCurrentProcess(), is_wow64);

var msg = "Hello world from Javascript executed in process: " + my_pid;
if (is_wow64[0] == 1)
	msg += " - I'm running under Wow64 :)";
print(msg);
The script opens the Win32 module in order to load the Kernel32 DLL by calling the LoadLibrary function. Using the obtained handle, the GetCurrentProcessId function address is resolved by using the GetProcAddress function. The other functions are resolved in the same way. You can now use the resolved functions by calling them as standard JavaScript functions. As final step, the script prints a string showing a message containing information extracted from the Windows APIs.

A fundamental step of the entire process is being able to easily test the script during the development stage. In this new Alan version, a new folder named tools was added to the Alan package. It contains the files cqjsx86.exe and cqjsx64.exe. These files are JavaScript interpreters in x86 and x64 version. Let's try to run our script with both files to see what result is produced (the --file option is used to specify the file path).

C:\Alan.v6.0.511.24\tools>cqjsx64.exe --file test.js
Hello world from Javascript executed in process: 15532

C:\Alan.v6.0.511.24\tools>
If we use the wqjsx86.exe program, we obtain the following result (I'm running my test in a x64 OS).
C:\Alan.v6.0.511.24\tools>cqjsx86.exe --file test.js
Hello world from Javascript executed in process: 30844 - I'm running under Wow64 :)

C:\Alan.v6.0.511.24\tools>
As can be noticed, the result is different according to the used version.
Once that the script works as expected, we can run it in the Alan agent by simply using the run command and specifying the full path of the script.

Windows API Data Structure Interoperation

The GetProcAddress and the LoadLibrary should provide the basic functionality to call every Wind32 APIs. However, interacting with a native API might requires further information. A typical example are parameters that are used as buffer (both in input and output). When this is the case, the following rules apply:
  • Each JavaScript Array is considered as an array of bytes when passed to a Win32 function. Each byte is casted to uint8_t (this causes a data truncation and a potential data corruption). If the array contains other complex data types (such as a String) its value is converted to NULL.
  • Boolean values are converted to 1 if true and 0 if false.
  • Each number is converted to a 32-bit interger on x86 process, and to 64-bit integer on x64 process.
  • Each JavaScript String is converted to an ascii string when passed to a Win32 function.
  • You can not call functions with more than 20 parameters.


The rules above imply that:
  • Each parameter passed by address to a Win32 function needs to be converted to an array (eg. to pass a LPDWORD you have to create an Array(4) parater if running in 32-bit or an Array(8) if running in 64-bit).
  • If a Win32 function accept a structure, it needs to be converted to an Array too. For example, a PROCESSENTRY32 structure must be represented as an Array and then parsed by refercing the fields by their offset (an example using this structure is presented later with some helper function to simplify the job).


All these rules might be quite annoying during the development of a not trivial script. In the next section I'll show how to easier the development task by implementing an lsass process memory dumper.

Implementing a simple lsass.exe process memory dumper

This is a perfect case to explore more in-depth this new feature. Being able to dump the process memory of the lsass process is very import to further compromise an host. There are various techniques to achieve this goal, but for the sake of simplicity I'll go for the simpler one, by using the MiniDumpWriteDump function. I'll put the script on GitHub so you can have a look at its full source code.

Let's suppose that our Agent is running as Administrator, then the following points have to be considered to write the dumper:
  • Enable SE_DEBUG_NAME privilege.
  • Scan all processes to identify the lsass.exe process.
  • Create a mini dump of the lsass.exe process.


As first step we have to load all the needed functions. This is a trivial task and already demonstrated in the previous example. Enabling SE_DEBUG_NAME is the next step. To perform this action we have to use a TOKEN_PRIVILEGES structure. This structure is quite simple, so for this task we will just create an array of 0x10 bytes and reference the sTP.Privileges[0].Luid, the sTP.PrivilegeCount and the sTP.Privileges[0].Attributes by their array offset. After calling the AdjustTokenPrivileges function we are now reayd to proceed with the next and probably most complex step.

We have to identify the lsass.exe process. To achieve this goal we use the CreateToolhelp32Snapshot function to obtain a snapshot and loop through all processes untile we find a process whose name is lsass.exe. This implies the usage of a PROCESSENTRY32 structure which is not that simple. To easies the task I created various JavaScript functions helper that serialize an object to a JavaScript array. The serialization function inspects the prefix of each field name and according to its value a specific serialization action is performed. For example, field names that start dw_ are serializated as DWORD. Field names that start with p_ are serializated to a four bytes array or eigth bytes array according to the value of a global variable that I defined at the start of the script (this step can be more dynamic by using the IsWow64Process function). Thanks to these functions, working with structures is now a lot easier (see the script source code for full details).

The final step is to create a file and call the MiniDumpWriteDump function to create a file dump that you can now download to your machine for post-processing.

Demo

Now that we have create our script to dump the lsass.exe process memory, let's use it. The video below provides a demonstration about how to dump the lsass.exe process memory by running our JavaScript script in the agent.

giovedì 20 gennaio 2022

Analyzing an IDA Pro anti-decompilation code


Twitter: @s4tan
GitHub: https://github.com/enkomio/

In this post I'll analyze a piece of code that induces IDA Pro to decompile the assembly in a wrong way. I'll propose a fix, but I'm open to more elegant solutions :)

The function that we want to decompile has the following assembly code (I'm using IDA Pro v7.6):

.text:1001BC95 56                  push    esi
.text:1001BC96 FF 74 24 10         push    [esp+4+arg_8]     
.text:1001BC9A 8B 74 24 10         mov     esi, [esp+8+arg_4] 
.text:1001BC9E 56                  push    esi
.text:1001BC9F FF 74 24 10         push    [esp+0Ch+arg_0]
.text:1001BCA3 52                  push    edx
.text:1001BCA4 51                  push    ecx
.text:1001BCA5 E8 57 20 FF FF      call    nullsub_1
.text:1001BCAA 8B 0A               mov     ecx, [edx]      
.text:1001BCAC 83 C4 14            add     esp, 14h
.text:1001BCAF 89 4E 0C            mov     [esi+0Ch], ecx
.text:1001BCB2 8B 42 04            mov     eax, [edx+4]
.text:1001BCB5 03 C1               add     eax, ecx
.text:1001BCB7 89 46 04            mov     [esi+4], eax
.text:1001BCBA 5E                  pop     esi
.text:1001BCBB C3                  retn


The function uses two arguments with an unconventional calling convention. If we decompile the code, we obtain:

int __cdecl sub_1001BC95(int a1, int a2)
{
  int *v2; // edx
  int v3; // ecx
  int result; // eax

  nullsub_1();
  v3 = *v2;
  *(a2 + 12) = *v2;
  result = v3 + v2[1];
  *(a2 + 4) = result;
  return result;
}
In IDA Pro the v2 variable (corrisponding to the line at address 0x1001BCAA) is colored in red, since its value might be undefined.

Custom calling convention might cause some problems to the decompilation process (see this), but, in general, there exist an easy fix to it: it is enough to inform IDA Pro that the function uses a custom calling convention. By modifying the function, we can set the new type with the following definition:

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
with this new definition, the decompiled code now looks like the following:
int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
{
  int *v1; // edx
  int v2; // ecx
  int result; // eax
  int v4; // [esp+Ch] [ebp+8h]

  nullsub_1();
  v2 = *v1;
  *(v4 + 12) = *v1;
  result = v2 + v1[1];
  *(v4 + 4) = result;
  return result;
}
We haven't done any progress at all. The only place where we haven't checked is the nullsub_1 function, the problem must be in its call. If we analyze this function, we notice that it has an empty body, as shown below.

.text:1000DD01 C3                  retn
Why is this function causing problems? The answer is in the software convention used by the compiler. During the compilation, the compiler considers some registers as volatile. This means that the value of these registers, after a function call, should not be considered preserved ([1]). Among the volatile registers, there is EDX, which is exactly one of the registers used to pass a function parameter in the custom calling convention.

This code causes problem to the decompilation process that considers (correctly) the EDX register to have an undefined value after the function call.

I'm not aware of any particular IDA Pro command to inform the decompiler to not consider EDX as volatile, so the simpler solution that I found is to just remove the call instruction (I patched the bytes E8 57 20 FF FF with 90 90 90 90 90). The result is a much cleaner decompiled code, as shown below.

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
{
  PUCHAR v3; // ecx
  int result; // eax
  
  v3 = *arg0;
  *(arg1 + 3) = *arg0;
  result = &arg0[1][v3];
  *(arg1 + 1) = result;
  return result;
}
Now we can proceed to further improve the decompilation code (we can clearly see the usage of a struct in the code) now that the decompiled code represents the real intent of the assembly code.

Update:

I received a message on twitter and reddit that suggests to have a look at the __spoils keyword mentioned in this Igor’s tip of the week post [2] (shame on me for not having found it).

Its meaning is exactly what we need to solve the problem in a more elegant and generic way. It is enough to change the nullsub_1 function definition by adding the __spoils keyword, as show below:

void __spoils<> nullsub_1(void)
The decompilation result of the function sub_1001BC95 is the same as before with the exception that the call to the nullsub_1 function is still there (it is not necessary to patch the bytes anymore).

Links:

[1] Register volatility and preservation
[2] Igor’s tip of the week #51: Custom calling conventions

sabato 18 dicembre 2021

Alan c2 Framework v5.0 - All you can in-memory edition


Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

I just released version 5.0 of my C2 post-exploitation framework Alan. You can download the binaries and read the release notes at: https://github.com/enkomio/AlanFramework/releases/latest

My goal with the Alan project is to provide a post-exploitation framework that can help red-team operators to further compromise their targets. Tipically, each team has its preferred tools to exploit the target, an example is the pletora of tools that can perform the memory dump of the lsass process. Alan does not enforce any particular tool, instead it provides the ground to run whatever tools the operator like. All tools are executed in memory in the address space of a pre-configured host process, or injected into another process.

This feature is achieved by the introduction of the new command run. This command accepts a file path on the operator machine and executes it on the compromised host without touching the disk. It is possible to specify command-line arguments that are passed to the executed program (this feature is not so common in the other C2 framework ;)). For this reason I decided to name this version "All you can in-memory" :)

Other commands were also implemented that allow the operator to execute a program on the compromised host. In particular the command exec was added to execute a new process and the shell command was modified to accept an argument that is the command to execute (if no argument is specified, a command shell is presented to the operator).

Find below the video that shows the following features:

  • Creation of a x64 powershell agent.
  • In-memory execution of the nanodump utility by using the configured host program (raserver.exe in this case) and passing a command-line argument. The Process Hacker windows will display the execution of the raserver.exe process.
  • Execution of the program notepad.exe in background.
  • In-memory execution of the dumper utility by injecting the binary in the just created notepad process. In this case the raserver.exe is not executed.