Making ELF packer for fun and chocapicz

Written by aaSSfxxx -

I recently decided to make an ELF packer, in order to learn some cool stuff about Linux kernel and ELF format, so I'll write 2 or 3 articles in this blog to explain some stuff I discovered. To write this article, I use NASM and a x86 linux kernel (yeah guys, I'm still on a x86 archlinux). But before, let's listen to some music


In this article (and the next(s) which will follow), I decided to make a "real" packer as we can see in win32, i.e. a packer which replaces its memory image by the packed binary's image.

First of all, the packer has to unmap its sections from memory (to make some space to map the "real" binary), but if we do this, we'll execute code which will no longer exist, and it's not a good idea. So, we have to map memory elsewhere, copy our code into that space and continue execution from it to avoid problems. The code into the new memory space will just unmap the old binary, maybe uncompress/decrypt the real binary, and map it into the memory, and then jump to its entry point.

In fact, it's more difficult than this if the binary is dynamically-linked to other libraries, because we'll have to load the dynamic linker manually, and do a lot of extra stuff to get it working (contrary to win32 which loads ntdll.dll by default and does all init stuff in LdrInitializeThunk). So I'll talk about "standalone" or statically-linked binaries packing.

Some thoughts about ELF header

As you may probably know, linux binaries have an ELF header, which describes where is the entry point, how binary is mapped in memory, what are the sections and other stuff like this. Two components are interesting here for us: the ELF header itself, which will give to us the real entry point, and also the "Program Header" which describes how the binary has to be mapped into the memory. So, here are these structures:

/* ELF header */

typedef struct {
   unsigned char   e_ident[ELF_NIDENT];   /* magic number et al. */
   u_int16_t       e_type;                /* type of file this is */
   u_int16_t       e_machine;             /* processor type file is for */
   u_int32_t       e_version;             /* ELF version */
   u_int32_t       e_entry;           /* address of program entry point */
   u_int32_t       e_phoff;           /* location in file of phdrs */
   u_int32_t       e_shoff;           /* ignore */
   u_int32_t       e_flags;           /* ignore */
   u_int16_t       e_ehsize;          /* actual size of file header */
   u_int16_t       e_phentsize;       /* actual size of phdr */
   u_int16_t       e_phnum;           /* number of phdrs */
   u_int16_t       e_shentsize;       /* ignore */
   u_int16_t       e_shnum;           /* ignore */
   u_int16_t       e_shstrndx;        /* ignore */
} Elf32_Ehdr;

/* Program header */

typedef struct {
   u_int32_t       p_type;      /* Type of segment */
   u_int32_t       p_offset;    /* Location of data within file */
   u_int32_t       p_vaddr;     /* Virtual address */
   u_int32_t       p_paddr;     /* Ignore */
   u_int32_t       p_filesz;    /* Size of data within file */
   u_int32_t       p_memsz;     /* Size of data to be loaded into memory*/
   u_int32_t       p_flags;     /* Flags */
   u_int32_t       p_align;     /* Required alignment - can ignore */
} Elf32_Phdr;

Here we just need the e_entry, e_phoff and e_phnum fields of the Elf32_Ehdr structure to have offset and number of Elf32_Phdr entries. For the Elf32_Phdr structure, we need to check if p_type is equal to PT_LOAD (we don't give a fuck about other segment types, they do not contain any useful information for us). If it's equal, we'll need p_offset, p_vaddr, p_filesz and p_memsz fields to have information about memory mapping.
And, unlike Win32 and PE header, nothing is aligned in ELF header, so we'll need to get our hands dirty and align everything by yourselves (the program excepts to have the byte of the file at the p_offset at the p_vaddr of the memory).

Btw, if you are interested, I wrote a NASM header to work with elf easier that you can download here.

Let's code baby !

The first "problem" to avoid, as I said in the first part is to copy the code into an empty section. So, we'll use some nasm magic, and do a code like this:


   ;; mapping stuff (eax contains mapping addr)

   mov ecx, (packer_end - packer_start)
   mov esi, packer_start
   mov edi, eax
   rep movsb

   jmp eax

   ;; some code and data

We'll also need to know where all the data needed is stored in the mapping, which can be done by storing mapping base (contained in eax) into a local variable and adding data offset relative to packer_start to it.

Now, let's go deeper in the packer code. First of all, it saves the offset. Then it grabs its program header table offset and number of elements, multiplies the number of elements by the program header's size to have the number of bytes to allocate (because the original program header table will be unmapped by the program and we'll get a segfault if we try to access to unmapped memory, so we need to copy the program header table elsewhere).

    push ebp
    mov ebp, esp
    sub esp, 14h
    mov [ebp-offset], eax ; save calculated offset
    xor eax, eax
    mov [ebp-dynamic], eax

    ;; Gets offset and header

    mov edx, 0x08048000
    mov ebx, [edx+elf32_hdr.e_phoff]
    add ebx, edx
    mov esi, ebx
    ; Calculates the right number of sections
    movzx eax, word [edx+elf32_hdr.e_phnum]
    movzx ecx, word [edx+elf32_hdr.e_phentsize]
    push eax ; save number of sections
    mov edx, ecx

    mul cx
    push eax ; save number of bytes to copy
    add eax, 1000h
    and eax, 0fffff000h
    mov [ebp-tempsize], eax

Then it maps new memory space and copies the program header table in the new allocated memory.

    ;; Maps a section to contain self's program header

    push 0  ; offset
    push -1 ; fd
    push MAP_PRIVATE | MAP_ANONYMOUS ; flags
    push PROT_READ | PROT_WRITE ; protections
    push eax ; calculated size
    push 0   ; no adress
    mov ebx, esp
    mov eax, SYS_MMAP
    int 80h ; syscall
    add esp, 24
    ; copy program headers
    mov [ebp-tempmap], eax
    mov edi, eax
    pop ecx
    rep movsb

    ; Point to first program header
    pop ecx
    mov ebx, eax

It unmaps all the sections of the binary and unmaps the mapping just created once the job done.

    ;; unmap old ELF sections

        cmp dword[ebx+elf32_phdr.p_type], PT_LOAD
        jnz .next
        push ebx ; push program header offset
        call unmap_stuff
        add ebx, edx
        dec ecx
    jnz .loop

    ;; Cleanup old mapping
    mov ebx, [ebp-tempmap]
    mov ecx, [ebp-tempsize]
    mov eax, SYS_MUNMAP
    int 0x80

Then, the packer grabs the original ELF binary (I didn't compress it for the PoC), reads its program header table, and maps the section if PT_LOAD is found. I also check if PT_INTERP exists, and if it's the case, we'll make the packer abort (because we don't map the dynamic linker in this article and without it, we can't do anything).

    mov edx, (packedbin-do_work)
    add edx, [ebp-offset] ; get elf in memory
    mov ebx, [edx+elf32_hdr.e_phoff]
    add ebx, edx
    movzx ecx, word[edx+elf32_hdr.e_phnum]

        ; check if it's a loading information segment
        cmp dword[ebx+elf32_phdr.p_type], PT_LOAD
        jnz .no_load
            push dword [ebx+elf32_phdr.p_flags] ; protections
            push dword [ebx+elf32_phdr.p_memsz] ; virtual size
            push dword [ebx+elf32_phdr.p_vaddr] ; virtual address
            push dword [ebx+elf32_phdr.p_filesz] ; file size
            mov eax, dword [ebx+elf32_phdr.p_offset] ; offset
            add eax, edx
            push eax
            call fake_map
        ; check if it's a dynamic section
        cmp dword[ebx+elf32_phdr.p_type], PT_INTERP
        jnz .no_dynamic
            mov eax, [ebx+elf32_phdr.p_vaddr]
            mov [ebp-dynamic], eax
        ; switch to next program header entry
        add ebx, 32
        dec ecx
    jnz .loop2

    mov eax, [ebp-dynamic]
    test eax, eax
    jnz .load_interp
        ; program doesn't have PT_INTERP, jmp to its entry point
        mov eax, [edx+elf32_hdr.e_entry]
        jmp .gtfo
        ; we don't load interpreter for the moment, simply GTFO and abort.
        jmp exit
    jmp eax

Now, let's have a look about the "fake_map" function.

In the traditional way, Linux kernel opens a file descriptor to the ELF binary, and uses it to map the binary file at different offsets with different sizes (and different permissions of course) as described in the program header table. But, here, we don't have any file descriptor, and we'll have to emulate mapping by hand, what does the "fake_map" function. To simplify code and avoid mprotect syscall I set all the permissions on "rwx" (which is really UGLY, I know). Here is the code, I hope with enough comments:

%define offset 8
%define size 0ch
%define base 10h
%define map_size 14h
%define elf_flags 18h 
    push ebp
    mov ebp, esp
    push ebx
    push esi
    push edi
    push ecx

    ; do the mmap   
    push 0  ; offset
    push -1 ; fd
    push PROT_READ | PROT_WRITE | PROT_EXEC ; permissions

    ;; Align mapping size

    mov eax, dword[ebp+map_size] ; mapping size
    ; add padding to ELF
    mov ebx, dword[ebp+base]
    and ebx, 0xfff
    add eax, ebx
    ; align size to a page
    add eax, 1000h
    and eax, 0fffff000h
    ;push new size
    push eax

    ;; Align base

    mov eax, dword[ebp+base] 
    and eax, 0fffff000h
    push eax  ; push base

    mov ebx, esp
    mov eax, SYS_MMAP
    int 80h ; syscall
    add esp, 24

    ;; Copy the in-memory file into the section

    mov edi, eax
    ; align the offset
    mov esi, [ebp+offset]
    mov eax, [ebp+base]
    and eax, 0fffh
    sub esi, eax
    mov ecx, [ebp+size]
    add ecx, eax
    rep movsb

    pop ecx
    pop edi
    pop esi
    pop ebx
    ret 14h

If you want to test by yourself, you can download the code nasm