Written by aaSSfxxx -
Finally after almost two years of inactivity, I finally motivated myself and decided a new article on this blog ! In this article, I'll show how to use radare2 and especially its Python bindings to write a little tool able to decode a large part of Cerber ransomware encrypted strings, thanks to the help of radare2. So, let's begin with funny stuff ! :D
You'll obviously need to have radare2 installed, along with Python bindings. Also keep in mind that I'm using a mixture of radare2 internal API and radare2 commands (to workaround missing stuff/bugs in the bindings, mostly related to nonexistent support of unsigned char* parameters from Python), so this internal API can break any time. At this moment, I didn't find any Python 3 bindings, so I'm using Python 2 for this article.
After binary analysis and location of string obfuscation function, we find that Cerber uses this function for almost every string in the binary, which leads to boring analysis. To circumvent this, I wanted to write a little tool that will find every place where the function is called, and walk into the stack to find the arguments given to the function, and decode them manually. Obviously, finding cross-references and decoding instructions by hand is a daunting task, so we'll use radare2 engine to perform this task "easily".
Loading radare2 inside our python script is as easy as importing RCore class from r2.r_core, and creating a new instance of it. After this, we just need to load the file, and ask the RBin plugin to map the sections inside radare2. This can be achieved simply by doing:
from r2.r_core import RCore core = RCore() core.file_open("your_malware_here.exe", 0, 0) # set base address manually, I didn't find how to do this automatically yet core.bin_load(None, 0x400000)
Then, we want to perform analysis to identify called functions, and moreover, do a cross-references analysis to spot all call to our decryption function. To achieve this, we'll just send radare2 commands "af", and "aar" (which stand for "analyze functions" and "analyze references"), with
Finally, we gets all xrefs for our function through the [code=python]core.xrefs_get(address)[/code], so we can iterate on them and get the address of the calling code. To sum up, to get the cross-references, we can do a code like this:
from r2.r_core import RCore PROGRAM="sample.exe" ADDRESS=0xdeadbeef def process_xref(core, addr): # do something awesome :D pass core = RCore() core.file_open(PROGRAM, 0, 0) core.bin_load(None, 0x400000) # set base address manually core.cmd0("af") core.cmd0("aar") for xref in core.get_xrefs(ADDRESS): process_xref(core, xref.addr)
Once we located the place where the function is called, we now want to get its call stack, so we can find the string to be decrypted, its length and the decryption key, and call PyCrypto's RC4 decryption routine (yeah, strings obfuscation in Cerber is just plain lame RC4). So, to achieve this, we need to parse the code backwards, and unluckily for us, there isn't exposed methods to do this easily through the bindings. So I chose to implement the dumbest methods, which consists to go one byte backwards, try to decode the instruction, and loop until we encounter a "push" opcode.
So here is the method performing this, and returning a RAnalOp object (which saves us the burden to parse assembly code):
# Returns a RAsm def disassemble_instr(inst, addr): return inst.op_anal(addr) # Hack because radare2 doesn't expose API to find prev instruction in bindings def get_prev_instr(inst, addr): return addr - 1 def find_prev_push(inst, addr): cur = get_prev_instr(inst, addr) found = False instr = None while not found: instr = disassemble_instr(inst, cur) found = (instr.type == 13) if not found: cur = get_prev_instr(inst, cur) return (cur, instr)
We can see in the code above that the "push" instruction is an instruction of type "13" for radare2 analyzer. We loop until we find a push instruction, and return the decoded instruction through analyzer, along with address of that instruction, to allow us to walk the code backwards.
So, to get our call parameters, we need to call three times our new "find_prev_push" function, which will return three "push" symbol instructions, and we'll able to fetch their values without hassle. We can now decode our strings, with the code below:
from r2 import r_core from Crypto.Cipher.ARC4 import ARC4Cipher from struct import pack # Returns a RAsm def disassemble_instr(inst, addr): return inst.op_anal(addr) # Hack because radare2 doesn't expose API to find prev instruction in bindings def get_prev_instr(inst, addr): return addr - 1 def find_prev_push(inst, addr): cur = get_prev_instr(inst, addr) found = False instr = None while not found: instr = disassemble_instr(inst, cur) found = (instr.type == 13) if not found: cur = get_prev_instr(inst, cur) return (cur, instr) def process_xref(inst, addr): old, push = find_prev_push(inst, addr) str_addr = push.val old, push = find_prev_push(inst, old) str_len = push.val old, push = find_prev_push(inst, old) key = push.val if str_addr >= 0xffffffff or str_len >= 0xffffffff or key >= 0xffffffff: print("Manual decoding required at 0x%x" % addr) return if str_len > 1000: print("Strange strlen found %d at 0x%x" % (str_len, addr)) return # Issue a command to print our string as hex format since r2 doesn't allow this # programmatically ciph = inst.cmd_str("p8 %d @0x%x" % (str_len, str_addr)).rstrip() ciph = ciph.rstrip().decode("hex") deciph = ARC4Cipher(pack("<I", key)).decrypt(ciph) print("%08x\t| %s" % (addr, deciph)) inst = r_core.RCore() inst.file_open("macbook_tutorial.unpacked.exe", 0, 0) f = inst.bin_load(None, 0x400000) # Launch function analysis inst.cmd0("af") inst.cmd0("aar") # Gets our xrefs anal = inst.anal xrefs = anal.xrefs_get(0x408545) # decryption func print(len(xrefs)) for xref in xrefs: process_xref(inst, xref.addr)
As you may have seen in the code above, I added some checks because of weird behaviour that appeared in the real world: sometimes, interesting values are stored in registers and the register is pushed, leading to inconsistent values in what we fetch and decoding failure. Also, because of my dumb approach to grab previous instruction, some junk code is decoded as push, and we also get invalid decoding because of that. Learning how to use radare2 opcode emulation engine (ESIL ?) should solve most of the issues faced here.
However, we can programmatically use radare2's power to do interesting things (such as binary unpacking/shellcode recognition/whatever, or maybe obfuscation cleanup !), and automate boring tasks.