Earlier this year I began researching Intel’s Clear Containers. Nowadays the project has been merged into Kata Containers. This project is basically emphasizes deploying lightweight virtual machines to run containers inside them. In order to archive this, the project utilizes QEMU as the virtual machine manager, so I ended up researching QEMU itself.

Exploring QEMU

I had a big challenge ahead, as I had no experience with QEMU, nor i was familiar with it’s codebase at all. QEMU does not have a high level design description document.nly the source code tells the full story, and so I began to scatter the internet in order to understand QEMU, it’s architecture, previous vulnerabilities, and their exploitation methodologies.

This preliminary research lead to the understanding that it would be feasible to look for vulnerabilities in the parts of the project that are responsible for various hardware emulations. Unfortunately for me,the virtual machines that are created in Kata Containers are not emulating any devices of interest.

I decided to proceed with exploring the QEMU project regardless of the fact that finding a bug will not affect Kata Containers.

QEMU can simulate a vast range of devices, and I had to pick a specific area to avoid getting lost in a big codebase. I was most interested in getting a remote attack avenue, I decided to put my effort into the network card emulations of QEMU.

Note: The attack surface might become remote in the case that the emulated card is also exposed to the internet. f that is not the case than the attack will be feasible from the hub, for example a neighbor VM.

I began by looking for an interesting code area, one that might lead to memory corruption. These areas can sometimes be quickly detected by looking for memory management functions. Some of these functions are considered dangerous, so looking for them might shed light on a code area that could be vulnerable. A quick and dirty way to find these areas can be by using the grep utility. For example we can call this command inside the qemu/hw/net folder:

grep -n -E 'memcpy|memmove|memset|memcmp' *.c

This will return to us any instance of the specified functions together with the line at which they are located in the file like this:

…(truncated for brevity)
pcnet.c:606:int result = (!CSR_DRCVPA(s)) && !memcmp(hdr->ether_dhost, padr, 6);
pcnet.c:622: int result = !CSR_DRCVBC(s) && !memcmp(hdr->ether_dhost, BCAST, 6);
pcnet.c:1003: memcpy(buf1, buf, size);
pcnet.c:1004: memset(buf1 + size, 0, MIN_BUF_SIZE - size);
pcnet.c:1058: memcpy(src, buf, size);
pcnet.c:1727: memcpy(s->prom, s->conf.macaddr.a, 6);
rtl8139.c:852: if (!memcmp(buf,  broadcast_macaddr, 6)) {
rtl8139.c:940: memcpy(buf1, buf, size);
rtl8139.c:941: memset(buf1 + size, 0, MIN_BUF_SIZE + VLAN_HLEN - size);
rtl8139.c:1226: memcpy(s->phys, s->conf.macaddr.a, 6);
rtl8139.c:1776: memcpy(vlan_iov, iov, sizeof(vlan_iov));
rtl8139.c:2160: memcpy(saved_ip_header, eth_payload_data, hlen);
rtl8139.c:2205: memcpy(data_to_checksum, saved_ip_header + 12, 8);
rtl8139.c:2213: memcpy((uint8_t*)p_tcp_hdr + tcp_hlen, (uint8_t*)p_tcp_hdr + tcp_hlen + tcp_send_offset, chunk_size);
rtl8139.c:2237: memcpy(eth_payload_data, saved_ip_header, hlen);
rtl8139.c:2271: memcpy(saved_ip_header, eth_payload_data, hlen);
rtl8139.c:2278: memcpy(data_to_checksum, saved_ip_header + 12, 8);
rtl8139.c:2322: memcpy(eth_payload_data, saved_ip_header, hlen);

 …(truncated for brevity)

Now that we know that there are in fact such functions in the code, we will need to check whether they are used in a safe manner, in order to do so we can dive into the code and look around; backtrace where the parameters are coming from and what might be their values.

Let’s take the memcpy function for example:

void *memcpy(void *dest, const void *src, size_t n);

The function simply copies n bytes from src to dest, but the function will not fail if it will be given the wrong size to copy. In other words, once a variable is allocated memory, there are no built-in safeguards to ensure that the contents of a variable fit into the allocated memory space. If a programmer wants to put ten bytes of data into a buffer that has only been allocated eight bytes of space, that type of action is allowed, even though it will most likely cause the program to crash. This is known as a buffer overrun or buffer overflow, since the extra two bytes of data will overflow and spill out of the allocated memory, overwriting whatever happens to come next. If a critical piece of data is overwritten, the program will crash.

Lets analyze this memcpy call from: pcnet.c:1003

Our call is happening on line 1003, with buf1 being the destination, buf is the source and size is the number of bytes to copy.

We can see that there is a condition that size has to meet in order for this call to happen.
Size has to be smaller than MIN_BUF_SIZE which is 60 bytes.

Going upwards we can spot the following buggy assignment on line 991:

int size = size_;

We will get back to it in a moment, for now let’s go on a little further up.e can see that the size of buf1 is 60 bytes and we finally arrive to the function prototype:

ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)

We can see that the size_ parameter then is accepted by the function is of type size_t,
size_t is basically an unsigned int, meaning it can only represent values >0 and the maximum value it can represent is UINT_MAX (4294967295).

In C there is something that is called integer promotion, so if size_ will be larger than the maximum value that int can hold – when the line int size = size_; will be executed – size will overlap and become a negative value.

For example if size_ (which is an unsigned int) is UINT_MAX then size (which is a signed int) will overlap and become -1, and if size_ is 2147483648(INT_MAX+1) than size will become negative -2147483648.On the other hand if we had the opposite assignment:

unsigned int size_ = size;

And let’s say that size is equals -1, then when the assignment will be executed size_ will become UINT_MAX.

Taking that in mind, we can clearly see that in that case of size becoming negative – we can easily satisfy the condition of if(size < MIN_BUF_SIZE) and arrive to our memcpy call.

Memcpy treats the size argument as a size_t (unsigned) so when memcpy receives a negative number it will promote it and it will become a huge positive number which will result in a memory corruption.

4 new CVEs

After finding this bug I checked other network cards in the same area, as I know that programs tend to copy similar code over, and I found the exact same bug in 3 other network cards. Overall I ended up with 4 different vulnerabilities. I shared the bugs with the QEMU team (employed by Red Hat) who patched the code and assigned CVEs.

The resulting CVEs:

After reporting the bugs I began to investigate whether it is feasible to send such a big packet. The preliminary research showed that it might be possible by patching the NIC driver which will require root access to the VM, but other responsibilities quickly caught up and left no time for active exploit development for those bugs. This might be addressed again in the near future to produce a proof of concept.

Responsible disclosure timeline

Here is a summary of our disclosure timeline:

April 29: Twistlock Labs reported the bugs to RedHat
May 23: Red Hat assigned CVE-2018-10839
May 31: Patches drafted by Red Hat
September 25: Patches pushed
October 8: Assigned CVE-2018-17958,CVE-2018-17962,CVE-2018-17963
October 19: Patches merged upstream