In my last post I wrote about discovering CVE-2017-9669 and CVE-2017-9671. In the following post I want to explore the exploitation process of these bugs and demonstrate how I was able to achieve remote code execution.

First step is reproducing everything that led to the crash in a deployment environment. This is a sanity check that must be done before moving on to actually trying anything with a sophisticated payload. My crash file from the fuzzer was read directly from a file handler in apk_tar_parse, but in a real scenario the file would be read as a tar.gz from a server.

To imitate a man-in-the-middle scenario, I used Docker’s --add-host feature. This flag allows defining a mapping to be added to the image’s hosts file. I also started a nginx image to serve my files from a local directory (using -v).

I gzipped the crashing tar from the fuzzer’s crashes directory and put it inside the served directory:

cat crash | gzip -9 > ~/docker/files/alpine/v3.6/main/x86_64/APKINDEX.tar.gz

I ran the image:

docker run -ti --add-host alpine:3.6

Executed “apk update”, and finally, I received a segmentation fault. This meant the crash was reproducible with an official image (so a DoS attack was achievable), and it was time to debug the execution and figure out how to exploit the crash.

I wrote a small Dockerfile that downloads dependencies and compiles the apk-tools package from source. I added the CFLAGS -g and -O0 flags to include all debug symbols and make debugging simpler. I built an image from this Dockerfile and ran with --privileged to allow gdb to disable ASLR.

I ran “gdb --args apk update” and reached a segfault as expected. gdb crashed on free(), meaning this was likely a heap overflow, and after a short debugging session I found out the bug with the first call to blob_realloc. For the rest of this post I recommend taking a look at the source file (archive.c) or just the code snippets from the previous post.

I crafted a minimalistic file that would trigger the bug by having a block file with a negative size (-1 for a signed 32-bit integer in octal, or 0o77777777777):

I gzipped the file and ran apk with the file in the server. This time the crash occurred when copying the null-terminating zero of the buffer (see image below). This was reasonable – was not allocated, so it pointed to null. Meanwhile entry.size was 0xffffffffffffffff(1) – an unmapped address, meaning any attempt to copy to it should result in a segfault:

Alpine linux advisory article (02)
This was not a problem whatsoever. I crafted another file that had two tar blocks this time – one to properly allocate the buffer with a size I control, following another one to exploit the vulnerability with the allocated buffer.

I debugged the execution (breaking on the call to blob_realloc) and as expected the buffer was first allocated, and then it was used as the copy target during the parsing of the second block. By this point it is worth noting one of the behaviors that made exploitation easier – the call to gzip_read (from is->read) would copy chunks from the source stream to the target stream and stop once the source stream ran out. So it would only copy as much as data as I put in the file!

Understanding and exploiting the heap

So now I had a buffer overflow of which I control both the size and the content. There are many ways in which such a scenario may be maneuvered for achieving code execution. As a matter of fact, the topic of heap exploitation was widely discussed within the InfoSec community. Great writeups on this topic and proof of concepts were published(2), most of them focusing on glibc.

Alpine, however, uses musl libc and not glibc. There are doubtless different ways to exploit it, similar or new. Yet I decided not to dig into musl’s source. My goal was merely achieving remote execution for the sake of a proof of concept, and I figured I should first try something much more simple.

If I could find anything useful on the heap, I may had overwritten it to manipulate the execution flow. For instance, I could overwrite a flag that checks signatures and then install a package I want, or change the data of some package that is to be installed, and so on. Or even better – maybe there were structs on the heap with function offsets, and I could overwrite an offset that was going to be called?

I focused on the latter. It was the simplest scenario and fastest way in which I could result in code execution by calling something like execv or system. But since the overflow would destroy the stack, I would need to either reconstruct it or just have the callback called before the memory manager makes use of the overwritten memory (presumably when free() is called).

I proceed to examine the code in attempt to find any variables on the heap that may end up ahead in memory. After debugging for a while I figured I could actually overwrite the is struct, which is used to call the function that writes to the stream. It’s type is apk_istream:

struct apk_istream {
void (*get_meta)(void *stream, struct apk_file_meta *meta);
ssize_t (*read)(void *stream, void *ptr, size_t size);
void (*close)(void *stream);

In theory I would overwrite the read function to my target function and later find what to overwrite to control the parameters to the call.

I put a breakpoint on the call to read to calculate the delta between my buffer to the is struct:

Alpine linux advisory article (02)

I filled my crafted tar file with 0x153a0 bytes accordingly, following 16 zero bytes – to overwrite the get_meta and read functions pointers with zeros (each function pointer is 8 bytes). And there we go – there program crashed on 0x0000000000000000.

It was time to try and call my target function. For simplicity I decided on system, which takes a string as a parameter and executes it as a shell command. It’s address was 0x7ffff7db0956. As for the parameters, from a basic observation it seemed the call to is->read always has is itself as a first parameter. That means I could overwrite the first 8 bytes of is with my shell string, as long as the get_meta function is never called.

The end of my tar file now looked like this:

First 8 bytes are my shell string, following 8 bytes of the address of system. And the execution went just as expected:

Disregarding the fact that the payload length is limited to 8 bytes here, this solution seemed to work. But there is a partially unexpected issue with it. Since the is->read call writes in chunks, there may be a scenario in which it only writes a part of the pointer (only 4 bytes) and then proceed to call this pointer to read more data. This is a problem I’ve encountered during my first construction of this proof of concept.

So I went on to find another struct I can overwrite after is. I restored the original pointers of the is struct on file, and added 8 bytes of data after it to see if I get another crash (the data I added was of trailing 0xAA bytes). This was the result:

It seems I was able to override the gis->bs pointer with my data. You may examine gunzip.c to understand exactly what is this bs pointer (see struct apk_gzip_istream). bs is of apk_bstream:

struct apk_bstream {
unsigned int flags;
void (*get_meta)(void *stream, struct apk_file_meta *meta);
apk_blob_t (*read)(void *stream, apk_blob_t token);
void (*close)(void *stream, size_t *size);

It is used in the same manner as is (also passes itself as a first parameter), but it gives us 8 more bytes (flags) to use for a the shell string. It’s time to run code by overriding the pointer to some place on the heap.

For convenience I put the address 32 bytes before is. The address would be 0x5555559682a0 (is-0x20). As suggested, it’s first 16 bytes are going to by my shell string. After that I would put the address of system (overriding the read address). For this demo I shouldn’t really care about the close address (so I put zero bytes there).

The payload then looks like this:

Redgis->bs->flags and gis->bs->get_meta. Overwritten with the shell string.

Yellowgis->bs->read. Overwritten with the address of system.

The rest – gis->bs->close and then the previous is->get_meta which are irrelevant. Then the original addresses of the is functions.

And it worked!:


By overriding function pointers of structs on the heap, I was able to execute code on the target machine. I simply ran echo in the example here but the payload can be anything, for instance, in one demonstration I opened a remote shell with netcat.

It is worth noting that for the sake of this exploit I assumed the attacker has knowledge of the memory layout of the executed program(3). So as a prerequisite to reproducing my exploit, ASLR must be disabled for the execution (unless you also found a memory leak in apk…). gdb does this by default, so there is no particular reason for you to manually disable ASLR when running with it.

Finally, besides writing fixes to address the vulnerabilities (commit 1, commit 2), Timo Teräs also issued a commit to reduce the use of function pointers that are saved on the heap, to make it difficult to exploit apk the same way explained in this article.

  1. One of the problems in the code was that entry.size was implicitly converted to 64-bit – the get_octal function returned an int but entry.size is of type off_t.
  2. How2heap does a great job aggregating known exploitation methods.
  3. That means that for different environments/kernels the offsets used in the exploit should be updated.

What’s Next?