Twitter: @s4tan
Sacara VM GitHub project: https://github.com/enkomio/sacara
In this blog post I want to describe a bit my latest side project and provides some data about how effective are protections based on software virtualization.State of the art
If you ever read an academic paper, you have noticed that is imperative to describe which is the current state of the art of the topic discussed. I found this section very helpful so I decided to report here the articles that I have read and, according to my opinion, their technical level. Of course this is not a complete list and is very probable that I have missed some good resources.Level beginner
As often happens there are a lot of good resource to start with, this is also true for the VM protection concept. At this level I think that the only needed skill is to be able to read Assembly and being able to use a debugger. If you are looking for some code to read I suggest you to take a look at Pasticciotto ([01]). It has also a nice writeup about how the VM works and which are the implemented opcodes. Another very interesting challenge is the one created by MalwareTechBlog, where you have to reverse a binary in order to obtain the flag. You can find a good write-up at [02].Level intermediate
Let's raise the difficulty bar and see some projects that were created with the real purpose to protect the code. The required skill is to be able to create some simple scripts in order to easier your task, but nothing too advanced. By considering projects created only for fun, the two most renowned ones are the hyperunpackme2 by thehyper ([03]) and the ReWolf x86 Virtualizer ([04]).Maximus wrote a good (and lengthy) write-up about the first challenge at [05]. Even Rolf Rolles wrote a post where he created an IDA Processor module to analyze the code ([06]). Before you ask me, I don't consider writing a full IDA Processor as having basic IDA scripting skills :)Level Advanced
To tackle advanced reverse engineering problems is not enough to have a very good understanding of theoretical concepts, but it is also necessary to be proficient with the available tools. At this level the amount of work that must be done in order to understand what a program is doing cannot be solved by just looking at the assembly code (at least without an enormous amount of pain). There are three cases that in particular I consider pretty difficult to analyze. The first one is a crackme challenge implemented by Solar Designer in 1996 (yes, you read it correctly, more than 22 years ago) [07]. In his project the author implemented what is know as a "one instruction set computer (OISC)", in particular he based all his work on the NOR instruction. The second one is the challenge number 12 of the 2018 Flare-On challenge (Suspicious Floppy Disk: Nick Harbour), in this case the author went one step further and implemented two nested OISC, where the first one is a SUbtract and Branch if Less than or EQual aka "subleq" and the second one is a Reverse Subtract and Skip if Borrow aka "RSSB".You can read a solution for this challenge at [08,09]. The last example, directly from the academia, is the tigress challenge [10], which is a challenge based on the obfuscation of the various hash functions, by using state-of-the-art protection (VM, Jitting ,etc...). A solution to part of the challenge was provided by Jonathan Salwan in [11]. As you can see by reading the solution of those challenges, the authors have used some advanced techniques that imply the creation of a custom CPU processor, or emulation via symbolic execution. Without a proficient knowledge of tools, solving that kind of challenges would result in a very complicated (almost impossible) task.Introducing Sacara VM
Sacara is another project that implements a custom low level language that can be used to obfuscate part of code. It is not a tool that translate a PE binary in an obfuscated one, you have to write your own program :) It tries to protect the code by using some features that increase the difficulty in the reverse engineering process (like Opcode encryption based on the location, multiple opcodes representation, usage of NOR instruction to implements various arithmetic functions, anti-debugging, and so on). I created the project since I wanted to experiment a bit in this area, in the GitHub repository you can find the assembler (written in F#) and the VM to execute the code (written in x86 assembly). I'm not going to describe in details how it works, it is open source, read the code if you are curious :) Instead, I want to show you how effective can be this kind of protection in order to hide the real meaning of a program when the binary is analyzed by an Antivirus. Before to proceed I want to make clear that this post is not another rant post on how the AV industry sucks. Too often people forget how difficult is to implement such kind of programs. If you really want to write a rant post on it, please be sure to present also an effective solution to the identified problems.Protecting a .NET binary
For my test I created a sample application that read a blob from the resource and load it via the Assembly.Load method. You can find the source code of this program in the GitHub project, under the Example\LoadEncryptedAssembly directory. The program allows to specify a .NET binary and a password in order to create a copy of itself with the specified file "encrypted" and embedded in its resources. The encryption is very simple, here is the code:public static void ManagedEncrypt(Byte[] buffer, String password) { var key = Encoding.Default.GetBytes(password); for (var i = 0; i < buffer.Length; i++) { buffer[i] = (byte)(buffer[i] ^ key[i % key.Length]); } }Once done that, you can invoke the new created program, which just loads the resource, decrypt it and run it. The important point is that I used the Sacara VM in order to do the decryption of the data. To do this I created a simple script that you can find here, find below the source: In order to have a realistic test I chose a malware from VirusTotal with a very high detection rate. After searching for the Assembly keyword I found this file: 3dd7ae0bca5e8e817581646c0e77885ffd3a60333a5bd24df9ccbe90b9938293, which has a detection rate of 65/68, as you can see in the following image: Then, I ran the following command:
LoadEncryptedAssembly.exe -b 3dd7ae0bca5e8e817581646c0e77885ffd3a60333a5bd24df9ccbe90b9938293 -p sacara -=[ Dynamically load encrypted Assembly SacaraVm sample ]=- For more information pass -h as argument New file 'LoadEncryptedAssembly.build.exe' generated. Run it to execute the program.As I said before the command takes the file, encrypts it by using as password sacara and embeds it in the resource. It generates a new file named LoadEncryptedAssembly.build.exe, if you run it you will see that after a while the original malware binary is executed. The question is, how effective is this kind of protection? I have uploaded the new file to VT: 2e46664c52373b9ec14c64496cf1d18661e745fb83f1cdaaf73970d4fca59bbe in order to analyze it and as you can see from the following image the detection rate dropped drastically to 3/64: