Scaling with Confidence: Postman’s Journey to Infrastructure as Code with Kubernetes and ArgoCD

Grant King

At Postman, we serve millions of developers building, testing, and documenting APIs. As our platform scaled, so did the complexity of managing our infrastructure. What began as a collection of manually provisioned services grew into a sprawling landscape of interdependent components deployed using varying methodologies and processes. This is the story of how we transitioned from a semi-manual, error-prone deployment model to a robust, repeatable, and declarative system powered by Infrastructure as Code (IaC), Kubernetes, and ArgoCD.

The Problem: Scaling Beyond Human Coordination

A few years ago, our deployment pipeline was a mix of manual scripts, some loosely enforced conventions, and tribal knowledge. It worked, until it didn’t.

  • Environment drift was common. What worked in staging wouldn’t always work in production.
  • Rollbacks were painful, with little confidence in the previous “known good state.”
  • Onboarding new engineers was time-consuming, requiring deep understanding of hidden dependencies and undocumented setups.
  • Operational consistency was difficult to guarantee across regions and teams.
  • Audit and change management were difficult; visibility into what changed or why those changes were made was not freely available.

We knew we needed a step-change in how we managed infrastructure: from handcrafted and mutable to automated, reproducible, and declarative.

The Vision: Platform as a Product

Our Cloud Platform team defined a north star: every environment, every service, every deployment should be codified and version-controlled. This would empower engineers to make changes safely and confidently, while platform teams focused on scaling infrastructure, not firefighting. The key goals:

  • Declarative Infrastructure with Git as the source of truth.
  • Standardized Kubernetes Deployments for every service.
  • Self-service delivery with high safety and observability.
  • Automated rollbacks and progressive delivery patterns.

Step 1: Embracing Infrastructure as Code

We began by reimagining our infrastructure as a set of distinct layers. Each layer is fully automated and defined in Git, allowing for clear versioning and traceability.This shift allowed us to bring peer reviews, change tracking, and automated validation into infrastructure management, just like we do for application code. These layers are deliberately decoupled from one another, ensuring a clean handoff of responsibility during deployment and making the overall system more modular and maintainable. The base layer of our core infrastructure has been codified using Terraform. Its primary responsibility is to provision and configure foundational AWS components, such as networking, Kubernetes clusters, and a pre-installed, pre-configured ArgoCD instance.

 

Step 2: Standardizing on Kubernetes, Helm and ArgoCD

To better manage our growing number of microservices, we migrated our workloads to Kubernetes. It gave us the abstraction and consistency we needed across services and regions, while also enabling a cloud-agnostic approach to service deployment. However, managing raw Kubernetes manifests at scale would quickly become unwieldy. To address this, we adopted Helm and created a standard Helm chart that all services use. This allows for consistent service definitions, easier onboarding, and centralized control over common configurations. Anyone who has deployed complex microservice environments knows the need for a tool that can define and manage the set of Helm charts required to run an application. In large-scale systems like ours, the number of services can easily reach into the hundreds. These aren’t just application services, they also include infrastructure components like observability, authentication, and configuration management. That’s where ArgoCD comes in. It allows us to define, group, and manage all the Helm charts needed to run Postman as a cohesive unit. These definitions are stored and versioned in Git, aligning with our GitOps approach and enabling repeatable, auditable, and declarative deployments. With all this in place:

  • All environment definitions live in Git
  • Argo continuously syncs cluster state with Git state
  • Rollbacks are as simple as reverting a Git commit
  • Changes are tracked, reviewed, and audited

Step 3: But what about resources?

It goes without saying that most services require supporting infrastructure, databases, message queues, binary storage, and more. Our goal was to manage these resources the same way we manage services: declaratively, repeatably, and securely.To achieve this, we use Crossplane, which allows us to define infrastructure resources as Helm charts, deploy them via ArgoCD, and have Crossplane provision them directly in the cloud (e.g., AWS). This ensures that when we spin up a new environment, all required resources are automatically created, fully configured and secure, without manual intervention.This approach has enabled teams to be self-sufficient in a safe, scalable, and standardized way. The Cloud Platform team owns the delivery and maintenance of the Helm charts available for use, ensuring that secure best practices are baked in by default.

Step 4: We wanted more; enter Parcels

Enabling Developer Velocity Through On-Demand Environments A key requirement for enabling developer velocity is the ability to easily spin up environments, not just for production, but also for non-production use cases. While provisioning new regions or production environments is important, the ability for engineers to create isolated, self-service environments on demand is critical for reducing bottlenecks and enabling rapid iteration. The Problem with Scale and YAML Tools like ArgoCD are powerful, but they rely heavily on raw YAML definitions. While this works fine when managing a handful of environments, it becomes error-prone and difficult to maintain as the number of environments scales. We have high expectations for non-prod environments. Engineers should be able to say: “I want an environment that looks like production, except for my 3 services. Also, I want any production changes to be automatically applied to my environment.” This level of flexibility shouldn’t require manual edits to YAML or custom scripting. What Developers Really Need To support real-world development workflows, engineers should be able to:

  • Define metadata for configuration options in a service-centric way.
  • Drive a UI-based experience from that metadata; no need to hand-edit YAML.
  • Automatically handle secrets
  • Users should only have access to the resources, services, or environments they’ve been explicitly authorized to work with.
  • Access all functionality via APIs, allowing changes to environments to be driven via a CI/CD pipeline.

Our Solution: Parcels

To address these challenges, we built Parcels, a layer on top of ArgoCD that ties everything together and enables fast, safe, and compliant environment management.Parcels gives teams:

  • The power to spin up full environments on demand
  • A consistent and opinionated structure that aligns with production
  • The ability to move fast without compromising security or maintainability

With Parcels, we’ve turned environment provisioning from a painful, manual process into a powerful lever for developer productivity.

 

Final Thoughts

Moving to Infrastructure as Code with Kubernetes and ArgoCD transformed how we think about operational excellence at Postman. By treating infrastructure as a product and embracing declarative, Git-driven workflows, we’ve unlocked a new level of scale, reliability, and velocity. And we’re just getting started.

Tags:

Comment

Your email address will not be published. Required fields are marked *