Is K8s Too Complicated?
Joe Beda had a great twitter thread this morning about the complexity of Kubernetes. You can read it in full, and I suggest you do, but I wanted to quote a few things I found to be particularly thought provoking:
First off: Kubernetes is a complex system. It does a lot and brings new abstractions. Those abstractions aren't always justified for all problems. I'm sure that there are plenty of people using Kubernetes that could get by with something simpler.
The story of computing is creating abstractions. Things that feel awkward at first become the new norm. Higher layers aren't simpler but rather better suited to different tasks.
And, the pick of the bunch:
That being said, I think that, as engineers, we tend to discount the complexity we build ourselves vs. complexity we need to learn.
These are presented somewhat out of order, because I wanted to put particular emphasis on the last one, which I think is a significant source of cognitive bias in engineering at large.
In the real world, these biases are amplified because "things we build ourselves" is actually "things we already know", and this includes "things we already learned". I suspect that a significant source of programmers flooding into management is the decreasing wavelength of full scale re-education on how to accomplish the exact same thing slightly differently1 with diminshing returns on tangible benefits.
I've been "learining" Kubernetes for a significant migration we are attempting at $WORK. I've found it tough going. It's a large thing, and it is very difficult to know how uninformed my opinion is from my limited exposure. I was originally going to write purely about simplicity and cognitive bias, but then Kris Nova tweeted:
In a single tweet — can you name a technical benefit you and your team have gained by switching to Kubernetes?
I really love this tweet, because the tone works if you consider it a question from a point of skepticism or as a search for confirmation. The answers, as it happens, will serve you regardless as well. For me, they highlight a particular problem that I've found over and over again with Kubernetes, and since it's been over a year since I've managed to finish a blog post, I might as well just write about it:
The k8s elevator pitch is straight up dogshit.
I've started to see, via my own usage, just enough of what Kubernetes brings to the table tech wise to start understanding the value, but it's hard to put into words. You can tell it's hard because you can't find it on twitter.
Want #Kubernetes Clusters on #AzureStack? This will help you run microservices and container-based workloads using the IaaS capabilities of Azure Stack.
Yesterday the CERN team were playing "public cloud bingo" at #KubeCon, to see how many public clouds they could attach to their Kubernetes federation.
Everyone wants to know about @IstioMesh at #KubeCon. @DanCiruli explains the gaps it is plugging in #Kubernetes ecosystems
Right. So what does it do? What can I do with it that I can't do already? Maybe lets look at the new feature list. Oh nice, looks like we're getting log rotation, I've only had that since 1994.
The marketing buries the tech lede. I'm not currently working on my resumé, so I don't actually give a shit how cloud native I'm not. In my limited experience, you get:
- inventory management and visibility
- common platform for building out tooling related to this (APIs for *)
- decent abstractions for provisioning and deployment
- service discovery and config management
Some of this is covered in their bullet points, but they are pretty abstract, and all high up on the pyramid of needs. You can't build a large system without most of them, but large systems exist, so clearly Kubernetes is not the only way to do any of this. Unfortunately, if you already have a pretty large deployment you probably need multiple clusters.
However, when you think about them a bit more, they are actually a really neat thing to find in a box somewhere. As someone who has written the better part of two custom databases, being able to hire someone who already knows your tools is fantastic. Having well written public documentation is really nice.
It also seems structured in a way that we will be able to really leverage open-sourced work in the burgeoning "infrastructure glue" space. This is a very underrated benefit. The promise of some kind of souk or foodtruck full of Chef recipes2 that we can all collaborate on and share has proven to be an empty one. Helm and Istio are both harbingers of Kubernetes' complexity and a testament to it working as a platform in ways the previous generation of half-baked3 devops tools did not.
Of course, they don't come for free, and even at the surface level there are questions that are difficult to answer or have problematic answers:
- How can I even what is all this yaml?
- I'd like to run another process oh I cant unless I sidecar it and intimately describe its every relation with the parent's environment via yaml hokay
- I want to run a database and I don't want you to reschedule me on a computer without any of my data
- How can this all go wrong and what happens when it does?
That first problem is a joke but it also kinda isn't. The second one is a nontrivial annoyance for almost every observability system out there, and making observability harder is a pretty bad first impression for a cloud native management system.
Alan Perlis said that a programming language was low level when it requires attention to the irrelevant. The rub is that it's difficult to have a firm grasp on relevancy once you add ignorance to the equation, and we're all ignorant in our own ways.
If you've been around the block developing and deploying large distributed systems, then you know my list of pluses is actually quite compelling. It has an opinion on things that we all agree are important. And for those of us who haven't been around the block, it puts a lot of emphasis on things that will become important as we make our first lap, which is a major contribution to the community.
But if you have already made decisions, things get complicated.
Lets use service discovery as an example. We've used service discovery along with health checks and self diagnostics to be able to do some fairly interesting things during outages and, more importantly, slowdowns. These are hard problems, and we've solved them rather crudely in places, but these solutions still give us a level of proven, necessary sophistication for system stability.
Unfortunately, our approach is slightly incompatible with Kubernetes' centralized one. Usually, when this happens, your needs are just too sophisticated, but in this case Kubernetes' approach is already explicitly complex in order to try to deal at the proper level of sophistication. It's a thoughtful and mature approach, but its structure is just inverted from ours. It's complex in incompatible ways, the worst of both worlds.
Its config management fares worse, as it is complex and impoverished. There is already a cottage industry of applications attempting to fix this, we'll see if they succeed.
I've seen this "complex but incompatible" pattern play out a couple times. The attitude towards state in general makes me wonder: do people even store data themselves anymore? Maybe that's what I'm missing.
It all leaves me concerned that the people with the most to gain will also run into the most problems trying to adopt Kubernetes, and while this is natural (large systems tend to be complex), it does make for bad advertising. "Kubernetes doesn't actually scale" is something I expect to start hearing more and more from people outside the bubble, and it will carry the weight of experience.
For our particular project, I think we're still getting started when you take into consideration the likely overall lifetime. Still, I suspect that moving an existing architecture wholesale to Kubernetes is a bad idea in general; if you've not grown up within its particular limitations, they will prove to be an endless annoyance4, even if there are a host of other tangible benefits over going solo. This may not be limited to Kubernetes, it may just be true for anything, in which case Kubernetes might indeed provide the path of least pain, its room temperature shattered glass fragments an eden compared to red hot glowing obsidian.
Given Kubernetes' critical mass, time will make an accurate appraisal of its complexity difficult. Will it be able to gracefully adapt to the kinds of advancements that it itself will bring?
Like a lot of other tech that has ostensibly come out of google, it will likely have at least one major source of complexity that 95% of people do not need and will not want. I've not gone looking for a custom implementation of http/2 with a broken congestion window, but maybe one will turn up.
Many of the problems that Kubernetes provides abstractions for, as opposed to solutions for, will age gracelessly as consensus grows on how to approach them. The balkanization of cluster management systems will fade as consensus solves by convention what is currently open to experimentation. Sources of complexity that seem necessary become over-engineering, the equivalent of pluggable server-side template systems in MVC frameworks in the age of one-page react apps talking to JSON endpoints.
So, is it too complicated? Probably. But it'll take a technology generation or so5 before the simpler core of Kubernetes makes itself known, and what we end up with will probably smell a lot like it.
1. As an example, I learned JavaScript, then Prototype, then jQuery, then Angular, and when React came out I was so relieved because I could finally decide that the whole enterprise was corrupt and give up any notion of being a "full stack engineer" to work on the backend full time.
2. I'm going to create SRE/operations software just so I can call it "tortured metaphor" as that seems to be the only real requirement. Its reimagined vocabulary for a bunch of shit we already have words for will be based on dryness, a reference to wit, sarcasm, and the reservoir of joy that led to its creation. Since most people know Chef and Ansible, Saltstack contains grains, pillars, mines, and minions.
4. Like OSX's frankly inexcusable cmd-tab/alt-tab behaviour.
5. A normal one, not a JavaScript one, so approximately ~3-5 years.