sabato 16 settembre 2017

Using a Mealy automata for string obfuscation

Obfuscating string is a very important aspect if you want to protect sensitive information. In the following post I'll present an alternative method to obfuscate strings by using a Mealy automata.

You can find the full source code of the project in this Github project

Introduction

From Wikipedia ([1]) we can read that a Mealy machine is a finite-state machine whose output values are determined both by its current state and the current inputs. There are plenty of information on the internet about this concept, so I will not go into further details. What is interesting to us is that by having an automata, we can give it a specific input and have the computed string as output.

The idea of using a Mealy machine to obfuscate strings is not new and it was already presented in [2]. Unfortunately the book doesn't explains how to create a Mealy automata in an automatic way. As the task of creating an automata manually, for each of the strings, is very time consuming, I felt that a very important part was missing.
In the following paragraphs I'll try to fill this hole by providing you with a possible implementation.
In the final section we will see an example that uses the code presented in the book.

Why using a new method?

If you have ever reversed a binary which use obfuscation you might have noticed that most of the obfuscation strategies are based on using some kind of know cryptographic algorithm or by using a custom encoder based on ADD/XOR/ROL instructions. Both cases have advantages and disadvantage (using a XOR obfuscation is a weak method, see [3]), but both are based on the assumption that they have the data encoded/encrypted in some way in the binary. In our case we convert the data in "code" that generates the decoded value at runtime.

For our purpose we will use a Mealy machine which has 0/1 as input. This choice will allow us to encode each letter with a bit, greatly reducing the input representation.

Implementation

Let's consider a simple automata (created with [4]):


On each arrow there is the input and the associated output. If we consider the state 0 as the initial one and pass the input string: 0 1 1 0 1 0 0 we will receive as output the string ANTONIA.

In order to automatically generates the automata I found that starting by considering the input was a pretty challenging task. So I changed strategy and considered the output in order to create the automata. For each character, the choice is between creating a new state, or connecting to one of the already existing states. You can find the full F# source code implementation with a test example here.

Testing

Now that we have an algorithm to generate the automata, let's try to obfuscate some strings. Since the implementation was done in F# I'll create an helper method to print the automata in an handy way in order to import the result in a C program. Let's consider the string supersecretpassword. The automata and the input generated from this string are (the result may be different on your machine):
Input text: supersecretpassword

Input: 1,1,0,0,0,1,0,0,0,0,1,0,1,0,1,1,1,1,0, Int: 250915

Output: {{'e', 's'}, {'e', 'u'}, {'p', 'e'}, {'e', 'a'}, 
        {'r', 'r'}, {'c', 'w'}, {'d', 't'}, {'s', 'd'}, 
        {'u', 's'}, {'p', 'w'}, {'a', 'o'}, {'d', 's'}, 
        {'s', 'p'}}

Automata: {{6, 1}, {5, 2}, {3, 2}, {4, 7}, {0, 11}, {4, 12}, 
          {12, 2}, {8, 4}, {12, 9}, {0, 10}, {6, 4}, {12, 7}, 
          {11, 10}}


Since the length of the string is less than 32 characters, we can convert the binary input into a DWORD. Now let's write a C program that reconstruct the given string:

#include "stdafx.h"

void deobfuscate(char* text, int length, int key, char automata[][2], char output[][2])
{
 int v = 0, state = 0, i = 0;
 for (i = 0; i < length; i++)
 {
  v = key & 1;
  key >>= 1;
  text[i] = output[state][v];
  state = automata[state][v];
 }
 text[i] = '\0';
}


int main()
{
 char text[20];
 int key = 250915;

 char automata[][2] =
 { { 6, 1 },{ 5, 2 },{ 3, 2 },{ 4, 7 },{ 0, 11 },
 { 4, 12 },{ 12, 2 },{ 8, 4 },{ 12, 9 },{ 0, 10 },
 { 6, 4 },{ 12, 7 },{ 11, 10 } };

 char output[][2] =
 { { 'e', 's' },{ 'e', 'u' },{ 'p', 'e' },{ 'e', 'a' },
 { 'r', 'r' },{ 'c', 'w' },{ 'd', 't' },{ 's', 'd' },
 { 'u', 's' },{ 'p', 'w' },{ 'a', 'o' },{ 'd', 's' },
 { 's', 'p' } };

 deobfuscate(text, sizeof(text)-1, key, automata, output);

 printf("Output: %s", text);
 return 0;
}


If we run this code we can see that the string "supersecretpassword" will be displayed in the console :)

Conclusion

I hope that you enjoyed this post as much as I enjoyed to write the code. If you find any errors or you know a better algorithm to implement the Mealy automata just leave a comment or drop me an email ;)

References

[1] Mealy machine
[2] Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection
[3] XORSearch & XORStrings
[4] Finite State Machine Designer