• Total des pages vues: 1408
  • Pages vues aujourd'hui: 273
  • Visiteurs connectés: 4
  • Nombre de visiteurs: 808

Breaking Cerber strings obfuscation with Python and radare2

Written by aaSSfxxx - 16 april 2016

Finally after almost two years of inactivity, I finally motivated myself and decided a new article on this blog ! In this article, I'll show how to use radare2 and especially its Python bindings to write a little tool able to decode a large part of Cerber ransomware encrypted strings, thanks to the help of radare2. So, let's begin with funny stuff ! :D


You'll obviously need to have radare2 installed, along with Python bindings. Also keep in mind that I'm using a mixture of radare2 internal API and radare2 commands (to workaround missing stuff/bugs in the bindings, mostly related to nonexistent support of unsigned char* parameters from Python), so this internal API can break any time. At this moment, I didn't find any Python 3 bindings, so I'm using Python 2 for this article.

After binary analysis and location of string obfuscation function, we find that Cerber uses this function for almost every string in the binary, which leads to boring analysis. To circumvent this, I wanted to write a little tool that will find every place where the function is called, and walk into the stack to find the arguments given to the function, and decode them manually. Obviously, finding cross-references and decoding instructions by hand is a daunting task, so we'll use radare2 engine to perform this task "easily".

Loading the binary and finding cross-references

Loading radare2 inside our python script is as easy as importing RCore class from r2.r_core, and creating a new instance of it. After this, we just need to load the file, and ask the RBin plugin to map the sections inside radare2. This can be achieved simply by doing:

  1. from r2.r_core import RCore
  3. core = RCore()
  4. core.file_open("your_malware_here.exe", 0, 0)
  5. # set base address manually, I didn't find how to do this automatically yet
  6. core.bin_load(None, 0x400000)
Then, we want to perform analysis to identify called functions, and moreover, do a cross-references analysis to spot all call to our decryption function. To achieve this, we'll just send radare2 commands "af", and "aar" (which stand for "analyze functions" and "analyze references"), with
  1. core.cmd0("af")
  2. core.cmd0("aar")
Finally, we gets all xrefs for our function through the
  1. core.xrefs_get(address)
, so we can iterate on them and get the address of the calling code. To sum up, to get the cross-references, we can do a code like this:
  1. from r2.r_core import RCore
  3. PROGRAM="sample.exe"
  4. ADDRESS=0xdeadbeef
  6. def process_xref(core, addr):
  7.     # do something awesome :D
  8.     pass
  10. core = RCore()
  11. core.file_open(PROGRAM, 0, 0)
  12. core.bin_load(None, 0x400000)  # set base address manually
  13. core.cmd0("af")
  14. core.cmd0("aar")
  16. for xref in core.get_xrefs(ADDRESS):
  17.     process_xref(core, xref.addr)

More craziness: walk through the calling code

Once we located the place where the function is called, we now want to get its call stack, so we can find the string to be decrypted, its length and the decryption key, and call PyCrypto's RC4 decryption routine (yeah, strings obfuscation in Cerber is just plain lame RC4). So, to achieve this, we need to parse the code backwards, and unluckily for us, there isn't exposed methods to do this easily through the bindings. So I chose to implement the dumbest methods, which consists to go one byte backwards, try to decode the instruction, and loop until we encounter a "push" opcode.

So here is the method performing this, and returning a RAnalOp object (which saves us the burden to parse assembly code):

  1. # Returns a RAsm
  2. def disassemble_instr(inst, addr):
  3.     return inst.op_anal(addr)
  6. # Hack because radare2 doesn't expose API to find prev instruction in bindings
  7. def get_prev_instr(inst, addr):
  8.     return addr - 1
  11. def find_prev_push(inst, addr):
  12.     cur = get_prev_instr(inst, addr)
  13.     found = False
  14.     instr = None
  15.     while not found:
  16.         instr = disassemble_instr(inst, cur)
  17.         found = (instr.type == 13)
  18.         if not found:
  19.             cur = get_prev_instr(inst, cur)
  20.     return (cur, instr)
We can see in the code above that the "push" instruction is an instruction of type "13" for radare2 analyzer. We loop until we find a push instruction, and return the decoded instruction through analyzer, along with address of that instruction, to allow us to walk the code backwards.

So, to get our call parameters, we need to call three times our new "find_prev_push" function, which will return three "push" symbol instructions, and we'll able to fetch their values without hassle. We can now decode our strings, with the code below:

  1. from r2 import r_core
  2. from Crypto.Cipher.ARC4 import ARC4Cipher
  3. from struct import pack
  6. # Returns a RAsm
  7. def disassemble_instr(inst, addr):
  8.     return inst.op_anal(addr)
  11. # Hack because radare2 doesn't expose API to find prev instruction in bindings
  12. def get_prev_instr(inst, addr):
  13.     return addr - 1
  16. def find_prev_push(inst, addr):
  17.     cur = get_prev_instr(inst, addr)
  18.     found = False
  19.     instr = None
  20.     while not found:
  21.         instr = disassemble_instr(inst, cur)
  22.         found = (instr.type == 13)
  23.         if not found:
  24.             cur = get_prev_instr(inst, cur)
  25.     return (cur, instr)
  28. def process_xref(inst, addr):
  29.     old, push = find_prev_push(inst, addr)
  30.     str_addr = push.val
  31.     old, push = find_prev_push(inst, old)
  32.     str_len = push.val
  33.     old, push = find_prev_push(inst, old)
  34.     key = push.val
  36.     if str_addr >= 0xffffffff or str_len >= 0xffffffff or key >= 0xffffffff:
  37.         print("Manual decoding required at 0x%x" % addr)
  38.         return
  39.     if str_len > 1000:
  40.         print("Strange strlen found %d at 0x%x" % (str_len, addr))
  41.         return
  42.     # Issue a command to print our string as hex format since r2 doesn't allow this
  43.     # programmatically
  44.     ciph = inst.cmd_str("p8 %d @0x%x" % (str_len, str_addr)).rstrip()
  45.     ciph = ciph.rstrip().decode("hex")
  46.     deciph = ARC4Cipher(pack("<I", key)).decrypt(ciph)
  47.     print("%08x\t| %s" % (addr, deciph))
  49. inst = r_core.RCore()
  50. inst.file_open("macbook_tutorial.unpacked.exe", 0, 0)
  51. f = inst.bin_load(None, 0x400000)
  53. # Launch function analysis
  54. inst.cmd0("af")
  55. inst.cmd0("aar")
  57. # Gets our xrefs
  58. anal = inst.anal
  59. xrefs = anal.xrefs_get(0x408545)  # decryption func
  60. print(len(xrefs))
  61. for xref in xrefs:
  62.     process_xref(inst, xref.addr)

Limitations / Conclusion

As you may have seen in the code above, I added some checks because of weird behaviour that appeared in the real world: sometimes, interesting values are stored in registers and the register is pushed, leading to inconsistent values in what we fetch and decoding failure. Also, because of my dumb approach to grab previous instruction, some junk code is decoded as push, and we also get invalid decoding because of that. Learning how to use radare2 opcode emulation engine (ESIL ?) should solve most of the issues faced here.

However, we can programmatically use radare2's power to do interesting things (such as binary unpacking/shellcode recognition/whatever, or maybe obfuscation cleanup !), and automate boring tasks.

Classified in : Hacking & Programming, Malwares - Tags : radare2, cerber, ransomware

Comments are closed.