# 10/20 Outage: What Happened and What's Next

At Postman, we deeply value the trust our customers place in us to power their development and business workflows. This week, we fell short of that trust. We recognize the impact this had on our users, and for that, we sincerely apologize. On October 20, Postman was impacted by a widespread regional outage that affected many cloud-based services. Critical Postman flows were impacted for an extended period. Such events always serve as a powerful reminder of the shared responsibility we hold as consumers of cloud platforms. We sincerely apologize for the disruption this caused to our customers' workflows. It gives us pause to reflect, learn, and strengthen our systems against potential single points of failure.

## **A Commitment to Continuous Strengthening**

 As outlined in our earlier blog on [platform maturity](https://blog.postman.com/engineering/scaling-with-confidence-postmans-journey-to-infrastructure-as-code-with-kubernetes-and-argocd/), Postman is deeply invested in evolving our infrastructure into an even more world-class, efficient, and resilient platform. Our long-term vision is to be able to declaratively define our entire platform, not only to enhance internal developer velocity but also to quickly and reliably stand up new cloud environments across regions and providers as customer needs evolve. We’ve already taken steps in this direction with the launch of our European data center, and future investments will extend that foundation into multi-region, active/active high availability. While the absence of these capabilities was evident in this week’s outage, it’s important to share that these are not just aspirational ideas. They are active engineering goals, and we’re committed to closing these gaps so that Postman can continue to serve customers reliably, regardless of external provider issues. ## **Graceful Degradation and Customer Experience**

 As we evolve toward full cross-region high availability, we’re also focused on enabling graceful degradation, ensuring that even during partial outages, the most critical product flows continue to function. We recognize that certain key user actions are essential, and interruptions there leave customers stranded. Our teams are already evaluating these core critical paths to ensure they’re protected and resilient in the face of similar incidents in the future. These will be the first areas of focus in our broader platform strengthening effort. ## **Improving Communication**

 Another important learning from this event was around incident communication. Our ability to quickly update customers was delayed because our status page, hosted on AWS, was also affected by the outage. Similarly, internal communication was slowed because the process to automatically create incident coordination channels depended on impacted infrastructure. We recognize that timely, transparent communication is vital during outages. In the short term, we’re investing in redundant communication tooling and standardized internal and external communication practices to ensure updates flow more consistently, even during provider-level disruptions. ## **Timeline of Events (PDT)**

 **Oct 20, 5:39 AM** – Postman observed significant error rates impacting core functionality and confirmed dependency on the AWS outage. The status page was updated to reflect a major incident. **Oct 20, 5:52–8:20 AM** – As AWS began recovering, Postman services also showed gradual restoration. **Oct 20, 5:56–9:38 AM** – Recovery of EC2 and Network Load Balancer health checks led to significant improvement in Postman services. **Oct 20, 5:17 PM** – Intermittent issues persisted due to residual AWS impairments, teams monitored and mitigated impact. **Oct 20, 7:00 PM** – Cross-dependency and search functionality issues were investigated and resolved. **Oct 20, 8:48 PM** – All Postman services confirmed fully restored. **Oct 20, 8:51 PM** – Incident marked resolved on the Postman status page. ## **Corrective and Preventive Actions**

### **Short-Term**

- [Beta launch](https://community.postman.com/t/offline-support-filesystem-and-native-git-is-coming-to-postman/86160) the local filesystem and native git support capability to accelerate path to general availability. Which will allow developers to continue working locally even during cloud service disruptions.
 
### **Medium-Term**

- Improve availability and redundancy of critical features during provider outages that degrade broader functionality.
- Strengthen coordination with AWS during regional incidents.
- Eliminate the unnecessary service dependencies identified.
 
### **Long-Term**

- Build multi-region redundancy for critical services to reduce dependency on a single AWS region.
- Enhance cross-service monitoring and dependency mapping to detect cascading failures early.
- Advance toward a multi-cloud, active/active architecture capable of seamless failover and recovery.
 
## **Looking Ahead**

 This event serves as a valuable moment of reflection and recommitment for us. Postman’s strength lies not just in our technology but in our willingness to learn, adapt, and evolve from every challenge. We’re grateful for the patience of our customers and partners during this event, and we’re using this experience to accelerate our journey toward a more resilient, fault-tolerant, and customer-first platform.