We are happy to announce a new series of blog posts that shed more light on how Twistlock uses Machine Learning (ML) to provide deeper protection to its customers. ML is a broad topic, and there are a lot of concepts to talk about. Our approach will be to choose a new and different example of how we use ML in Twistlock for each blog post from these series.
At Twistlock, we standardize the usage of whitelist and container immutability as security armors for your apps. That is, we actively scan your images, containers, hosts and cluster configurations and build a highly granular security profile that fits like a glove to the runtime behavior of each of your apps. The profile contains a significant amount of information about the expected runtime behavior including, which processes should run, which files are accessed and modified by your app, which system calls are used and much more. The whitelisted profile is built using an extensive set of generalized business rules, which we continuously modify and actively update using our intelligence feed, that in turn is constantly updated based on the work of our security researchers. In practice, this security model substantially reduces the attack surface and provide great defence against common attacks, including APT.
Nevertheless, there is a natural limitation to the granularity that can be achieved with customized business rules. As much as tight the whitelisted profile may be we want to be able to detect attacks even when they stay inside the whitelist boundaries. To illustrate the problem, and our solution, let’s discuss a real life scenario. Let’s assume we are running an nginx container. By analyzing the container, its image, metadata, etc., we can learn multiple different whitelist features, e.g., which process are expected to run (nginx), which system calls are used (e.g., socket and read), and what is the interaction with the filesystem. Practically, every deviation from this whitelist model will automatically trigger a simple audit event (in future blog posts, we will show how to use more advanced techniques to score, correlate, and composite multiple events). Even in case the container is compromised, the whitelisted profile makes it very hard for an attacker to further manipulate the container and perform lateral movement within the cluster. However, even in this case, we would like to notify and react to the threat as fast as possible, even if the threat uses artifacts from within the whitelist (e.g., executes allowed process). To solve this problem, we apply an ML technique described below.
A naive approach to detect data breaches or attacks might use standard supervised learning techniques using app behavior statistics as features and historical or fake data for labels, especially negative ones. Using historical data is problematic since it is, by definition, built using known malwares, and usually a very small portion of those. As a result, the learned model ability to detect behavior of a zero day vulnerability is very limited. Especially this is manifested by the unacceptably high false negative rate.
Our approach here is to build a statistical model that describes the state of the system (app) prior to the event and after the event. Once built, we can use the new model to decide whether a new state the system was transformed into, via a new event, is valid or not. To enhance the accuracy and robustness of the learning process, we use the cluster and container metadata to learn events from multiple nodes simultaneously. The fact that each container encapsulate a single application, helps us to easily reduce the environmental noise from other apps when collecting new events.
Given the data, we learn if the transition from state Si given event e (e.g., new process) to state Sj is valid. To build the classifier only from positive data, we use a method called one-class support vector machine . Given a set of valid (allowed) states, this technique enables us to classify whether the new event transitions the system into a new valid state or not. Let’s use a simple visualization to illustrate the process.
Assume that we map each state in the system to a unique number in the range [-6,6]. We use the x-axis to plot the current state and y-axis to plot the next state. Given this training data, we create a one-class classifier that defines valid transition states. Now, given a new event, we can determine if this event transition the system state to a valid state.
Thank you for reading this far. As mentioned above, this approach is an example of a set of different approaches we use to make Twistlock smarter and easier to use. If you find this content interesting please consider following Twistlock on Twitter via @twistlockteam.
- Container Security
Follow us on Twitter
Follow us on Twitter for real time updates on the cloud native ecosystem, Twistlock product, and cloud native security threats.
Five Best Practices for API SecurityRead the Blog
When On-Premise Serverless Beats the CloudRead the Blog
Kubernetes AuditSink: Real-time K8s Audits and ForensicsRead the Blog
Native Helm Charts for Frictionless Kubernetes DeploymentsRead the Blog
How Knative Can Unite Kubernetes and ServerlessRead the Blog