You are working on a big Kubernetes cluster with numerous nodes, running many pods of the same application. All pods are running healthily, going about their business, when you suddenly notice something you made a mistake in the Dockerfile of the pod’s image. You set the wrong permissions on some critical files in the image.

Well, it’s not hard to fix. You correct the Dockerfile, rebuild the image, tag the new version, push it to the registry, and delete all the bad pods by their shared tag. Since a replica set is configured for these pods, Kubernetes hastily starts new pods with the corrected image.

But what if for some rare reason you can’t stop the pods?

This was the case we recently had to deal with for a challenge we are hosting at Twistlock Labs. Without going into the challenge details (which we may do in another post), the challenge required running a persistent dedicated machine for every participant. We accomplished this by deploying Kubernetes pods through a ReplicaSet.

Despite our beta testing efforts, we had a file permissions mistake in the original image we deployed, which we detected when there were already more than a hundred participants working on solving the challenge in their pods. Deleting their pods was something we had to avoid at all costs.

We were looking for a quick way to deploy a patch to all running pods. While I was doing my best effort to find the “right” Kubernetes way to do it, our chief architect Liron suggested a simple solution – executing a command with our changes directly on all the underlying pod containers. We could find all these containers by their tag name with docker ps, and then exec the patch to them.

It turned out Kubernetes labels do not map one to one to the underlying Docker container labels, but we managed to get all the right containers by matching the label that Kubernetes sets to the image name.

The command we finally ran on each node looked like this:

for i in $(docker ps -a --filter "" -q); do docker exec $i bash -c "chown root:root /important/file && chmod 0500 /important/file"; done

While executing directly to the containers was the easiest solution in our case, this could had also been achieved by getting all pod names with kubectl get pods and running kubectl exec on the right container in each pod.

Anyway, we should further clarify this hack is probably bad practice and should only be used in rare edge cases like we encountered, where pods must not be deleted. The right way is to have Kubernetes restart the pods. The internal label we used may change too, as it is not documented API.

Do you know of less hacky ways to hotfix running Kubernetes pods? Write to us at @TwistlockLabs or @TwistlockTeam. And if you are experienced in reverse engineering, might as well try to solve our challenge, some of the winners are working on writeups so we might not keep it up for much longer.