Empowering Devs to do Ops with Declarative Interfaces

How putting application-specific deployment rules under a declarative interface can make everything easier for developers who have to spin up dev, testing, or preview environments.

Oct 21, 2023

Deployment tooling is powerful. Think about the configuration space available to operations engineers with classic tools like Ansible, Terraform, Kubernetes, and Pulumi. Messing with the configuration files of these systems for a large enough deployment feels kind of like sitting in the cockpit of an airliner.

Most organizations that need to manage complicated deployments have an operations team, or operations-minded backend engineers, with the wherewithal to harness these tools sanely. The vast array of dials and knobs at their disposal allow them to fine-tune the system and get their production environment exactly the way they need it.

It only gets annoying when you go off the rails, and try to do crazy stuff like get a deployed environment the way you, a backend developer and not an operations engineer, need it. Maybe it’s for a test scenario, or a dev project depending on a certain state in the backend. Or maybe you’re showing your work to end users and you want to get the state of the environment just right to do a demo.

Either way, unless you have some good internal tooling or a platform engineering team that handles these configurations for you, you probably have to figure out a bunch of stuff, like how to set up the right command line arguments to the server processes, configuring the right environment variables or files on disk, setting up the right versions of everything, etc. etc. You generally can’t just grab whatever was working in production and run it where you want it. The full configuration capacity of the production deployment system is usually also what gets in the way here, and forces you to redefine from scratch how things tie together.

Let’s call this set of things specific to your particular backend “application-layer” things, because they have to do with your application. This is distinct from “infrastructure-layer” things, which deal with the specifics of hardware, networking laters, container configurations, etc. To make it simpler: “infrastructure-layer” is what things like Kubernetes, Docker, Ansible, and Terraform do. “Application-layer” things are what the people who configure those tools need to think about. For example: Which environment variables need to be set, and what are the valid values?

The most blessed of devs have a strong platform engineering team that builds internal tooling so that you don’t have to think that much about infrastructure-layer nor application-layer concerns. In the ideal world, developers can just push a button to get the “environments” (instances of deployed applications) that they need to do their jobs.

There’s some fancy names for the internal tooling that attempts to achieve this ideal world. Netflix called it the paved road, Spotify called it the golden path. Whether it’s paved or golden, whenever these tools deploy software into various environments, you can think of these tools as providing a dimension reduction of the configuration space necessary to produce high-level changes in the environment (i.e. do we need high availability or scale? Is feature flag X turned on or is it running locally, remotely, or in a CI box on a schedule?).

In short, they bridge the gap between devs and “DevOps” - empowering devs to get the environments they need, where they need them, without getting their hands (too) dirty.

What if you don’t have a platform engineering team? I empathize. Not everyone has one, but nowadays almost everyone with a multi-service backend has the above problems. Even if you’re a solo dev, you’re probably using other service components (databases, queues, etc) built by other devs. Effectively your team is more than just you, but those other devs aren’t on an internal Slack where you can just ping them. So why would they spend their time teaching you specifically how to configure their software for your own needs?

Ideally, everyone who develops backend services can solve that “dimension reduction” problem out-of-the-box. I imagine a world where backend components are shipped, by default, with their own set of “golden-path” tooling that abstracts away the low-level container/config/hardware/networking details so that end-users of the component can get the component running the way they want it, without having to acquire swaths of ops-specific knowledge like how to manage bind mounts or service discovery themselves. That way instead of tracking down the original maintainer of the software to figure it out, you can just grab the golden-path tooling and run with it.

The problem with dimension reduction in deployment tooling is that it really depends on the dimensions you want to reduce to. Those are specific to the team, company, or developer that’s building the system. Platform engineering orgs set guidelines on what dimensions are reduced to. Individual devs and open-source maintainers can generally do the same, it just requires an understanding of the “golden-path” use cases of the components you’re building.

Because the golden-path is so specific to the particular service in question, it’s really hard to build a declarative, statically configured system that enables this kind of dimension reduction. It would never be flexible enough. That’s part of the reason infrastructure-as-code tools like Pulumi and Dagger are gaining popularity. It’s also the reason we see a lot of templatized or modular configuration tooling like Helm charts, Kustomize, or Cue.

I think a great way to solve this problem is to provide a flexible, programmatic interface to people playing the “platform engineering” role…they can embrace the full power of a programming language to handle the dimension-reduction work. However, the output of their work should produce a static, declarative-style configuration interface. That way the people using the “dimension-reduced” artifact don’t have to worry about much of anything besides the configuration schema - they configure, and the software takes care of the deployment logic to get the application-layer results.

We built Kurtosis (check it out on Github / leave a star!) that way. The developers who need environments just use JSON or YAML to configure the high-level changes they need to their applications. The “dimension-reducers” use Python (technically a dialect of python - Google’s Starlark) to define their deployment logic from the set of JSON-defined parameters, with resource management and full portability built into the scripting system. I made a 10-minute video to show how it works:

Galen’s Corner

Discussion about this post