Big News for Big Data in Kubernetes

06/11/2018 - 14:00 to 14:40
Palais Atelier
long talk (40 min)

Session abstract: 

The folk wisdom has always been that when running stateful applications inside containers that the only viable choice was to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it is considered a problem because stateful containers generally can’t preserve that state across restarts.

In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.

I will describe recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments it is even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.

The benefits of systems like this extends beyond management simplicity because applications can be more agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.