How Postman Agent Mode hacks for me

How Postman Agent Mode hacks for me

User Avatar

Code is shipping faster than it ever has. AI assistants are writing the first draft of pull requests on most teams I talk to. Meanwhile the security-review side of the house hasn’t kept up. The tools we have today can mostly detect lexical bugs (SQL injection, hardcoded secrets), but stop short of catching anything that requires deep understanding of what the code is supposed to do.

Full manual security review at this scale is impractical, so we need a good system to match the velocity. This article is about how I tried to solve that by leveraging the same platform where you live and breathe APIs. See how a similar approach can slot into your secure development stack.

Why semantic issue detection is the hard one

If you’ve been doing API security for a while, you already know: the OWASP API Top 10 leans hard on authorization bugs. API1 (BOLA), API3 (BOPLA), and API5 (BFLA), this accounts for somewhere between 1/2 and 3/4th of real-world API breaches depending on whose telemetry you trust.

The reason these issues persist isn’t that engineers don’t know about them. It’s that they’re semantic. To detect “non-admins can invite users,” a tool needs to understand:

  • What the endpoint is supposed to do (only admins should be able to invite): the intent
  • What the endpoint actually does (it accepts any team member): the implementation
  • That those two facts disagree: the drift between intent and implementation

That’s not a lexical pattern. There’s no regex for “this handler is missing a role check.” A DAST scanner hitting POST /teams/team-alpha/invites with a viewer token and getting a 201 sees a healthy response. It has no idea the contract expects admin-only. A SAST scanner reading the same handler sees a perfectly normal authentication check; the absence of a role comparison isn’t wrong-looking, it’s just absent.

Intent is the missing input. And intent, for most teams, lives in their head, in their spec, or in a wiki page nobody updates. That’s why these bugs keep showing up.

The pipeline

Before we get to the demo, the setup is worth pausing on, because the prerequisites are doing a lot of the heavy lifting. This pattern works well for API-first teams that have an API spec maintained in the repo, with semantic annotations for security intent (x-required-role, x-required-membership, x-rate-limit, or even disciplined description).

The API spec is sometimes missing, but for AI-age teams shipping new services, spec-first is increasingly the default because AI assistants do better work when given a spec to anchor on.

I should be honest: If your API spec is stale, this pattern surfaces that as drift , which is itself useful information, but it’ll be noisy until you fix the spec.

Here’s what I used

Delivery mechanism
  • Skill #1 (custom built), endpoint-discovery, reads the API spec (for intent), the source code (framework-level parsing), and the functional Postman collection (optional). For each identified endpoint the skill captures three views: intent, implementation, exercised, and then runs drift checks between them. Anywhere the intent disagrees with the implementation, the skill writes to the discrepancies array (structured drift findings tagged with OWASP categories and severities). Output is a single endpoints.json.
  • Skill #2 (custom built), security-threat-model, consumes that inventory and drift finding (endpoints.json), and writes two artifacts:1. security-test-plan.yaml: declarative test plan. One suite per endpoint. Each case names an actor, a request, an expected outcome, and (where applicable) the specific drift finding it was derived from.2. postman-agent-brief.md: the natural-language prompt for Postman Agent mode to generate security test collection based on the security test plan generated.
  • Postman Agent mode with native git to read those skills + artifacts directly from the local code repo, and generate security test collections
  • Postman collections as the runnable “security contract” versioned, designed to perform live security testing, and later as a regression check for continuous security monitoring using Postman monitors.

The demo

For the demo, I developed an application with /health, /auth/login, /me, and GET /teams/{teamId}/invites already on main. On a feature branch (feat/team-invites), I added a new POST /teams/{teamId}/invites. The spec is annotated honestly (x-required-role: admin), plus description that says “Only admins can issue invites”,. The implementation, in classic Friday-afternoon style, checks team membership but forgets the role check, and accepts a role field in the request body. That’s API5 BFLA and API3 BOPLA in the same endpoint. A viewer can create an invite at admin level.

Difference between intent and implementation

The Postman desktop client has its Agent mode pointed at the repo via native git. With this, Postman Agent reads SKILL.md files, source code, the spec, and the generated artifacts directly from the code repository (local files in our demo).

#1 Starting state on main

Working tree is clean. security/artifacts/ doesn’t exist yet (these are generated, not committed).

#2 Create the feature branch and introduce the bug

The new POST handler does the membership check but skips the role check in a new feature branch.

#3 Prompt Postman Agent Mode to audit the branch

Switch to Postman desktop. Open the Agent Mode panel. Native git is already configured against this workspace path; Agent mode will use LLM to orchestrate. Prompt something like: “Audit my changes for security drift, then author the security contract collection per the brief.”

#4 Agent Mode invokes the endpoint-discovery skill

Agent Mode reads .claude/skills/endpoint-discovery/SKILL.md, matches the prompt to its description, and runs the helper script via its shell tool.

#5 First artifact appears: endpoints.json

The skill emits security/artifacts/endpoints.json. Five endpoints inventoried. Four drift findings, two of them high-severity on the new POST endpoint. This will be further validated by next skill.

The headline finding inside that file:

{

“type”: “INTENT_VS_IMPL”,

“field”: “checksRole”,

“expected”: “admin”,

“actual”: false,

“owaspCandidate”: “API5_BFLA”,

“severity”: “high”,

“rationale”: “Spec declares x-required-role: admin but the handler never compares membership.role.”

}

A second finding flags the role enum in the request body: API3 BOPLA, mass-assignment risk. Together, those two are the full privilege escalation path.

#6: Agent Mode chains to the security-threat-model skill next

Without re-prompting. Agent Mode sees in the first skill’s SKILL.md that security-threat-model is the natural next step, reads its SKILL.md, and runs the required steps.

security/test-accounts.yaml is the meta file that can be defined by security team inside of code repository. It will be read by the security-threat-model skill along with security/artifacts/endpoints.json for output generation. security/test-accounts.yaml will cover test account (all roles) meta information like: there’s an admin in team-alpha called admin_team_alpha, a viewer in team-alpha called viewer_team_alpha, and an admin in team-beta called admin_team_beta. Each entry has an id, username, tenant, role. No passwords (those are in the Postman vault).

For each endpoint, the skill considers several case categories and emits the ones that apply. The actor selection rule for example:

non_admin_blocked (the BFLA case): emitted whenever intent.requiredRole is set on a state-mutating endpoint. Actor: findLowPrivInTenant(focalTenant): picks an account in the focal tenant whose role is below the required role. For the invites endpoint requiring admin, that’s viewer_team_alpha.

#7: Second wave of artifacts: security test plan + Agent Mode brief

Two new files appear in security/artifacts/:security-test-plan.yaml and endpoints.json

The plan is declarative, one suite per endpoint, each test case named in plain English with the OWASP code attached. The two cases that will fail on the vulnerable server:

– id: non_admin_blocked

name: “Non-admin (viewer) must NOT be able to POST

/teams/{teamId}/invites — only admins are authorized”

owasp: API5_BFLA

severity: high

derived_from: [INTENT_VS_IMPL/checksRole]

– id: non_admin_cannot_escalate

name: “viewer must NOT be able to escalate by submitting

role=\”admin\” – privilege escalation via mass assignment”

owasp: API3_BOPLA

severity: high

derived_from: [BODY_ACCEPTS_PRIV_FIELD/role]

#8: Agent Mode reads the brief and authors the security collection

Agent Mode picks up postman-agent-brief.md, applies the structure it specifies, and writes a new security Postman collection in the current workspace. Setup folder logs in three test accounts using environment variables; three endpoint folders below it carry 13 test requests total, with OWASP codes in the names.

#9: Run the security collection against the vulnerable server

Using an environment variable file containing appropriate test credentials, run the collection. Two failures. Both contain the OWASP code in their name; a reviewer reading the runner output knows the category, and the severity.

#10: Use this security collection as a continuous security contract test monitor

The same collection now stays in the workspace as the regression contract via Postman Monitors. If any future code change breaks role-gating, you will be able to detect without re-testing everything. This will ensure you have healthy and secure APIs running in production.

Going further

The next moves using the similar system can make this fully autonomous as per your use-case. The skills can be designed per your infrastructure, and can also run cleanly in CI using GitHub Action on every new PR. Same flow, no human handholding, institutional teeth instead of “remember to run this”.

The shape (readable contract plus agent enforcement on every change) generalizes well beyond APIs and authZ. The same pattern can work for IaC policy drift, for data-classification labels on new fields, for SBOM compliance on new dependencies, for secrets scanning, and for any security control where the rule is checkable but the per-PR human-review cost has become prohibitive.

As always, if you’d like to chat more about solving security problems, AI, APIs, or tech in general; hit me up on LinkedIn. My DMs are always open!

What do you think about this topic? Tell us in a comment below.

Comment

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.