BSides Fort Wayne 2026 • 10:00–11:00 AM • International Ball Room • Jim Haist
"I didn't engineer this. I had a conversation with it."
A support specialist — heading into a night-shift Data Center Tech role at the AWS site in New Carlisle, grinding through a Cloud Computing degree at WGU. $200/month. Three enterprise servers already in the house. Over roughly three months, starting from basically nothing, that conversation became a production security platform on OpenShift: Kubernetes, a secrets vault, single sign-on, active threat blocking, GitOps end to end. On paper, the guy you hand a ticket or a password reset to. The whole point is that the barrier just came down.
Section 01 — The Real Enemy
Most security talks pick the wrong enemy — the nation-state genius in a hoodie. The data says otherwise. Verizon's latest breach report covers tens of thousands of real incidents: about 62% still involve a human element. For nineteen years stolen credentials were the number-one way in; this year, for the first time, exploiting software vulnerabilities edged them out. The door changed. The story didn't — a person still walked through it, and the basics that stop that still weren't running.
Those basics — MFA, email filtering, patching, someone actually reading the logs — don't get skipped because they're hard to understand. They get skipped because they carry too much friction: too many steps, too much time, and they need someone to own a system when nobody has the bandwidth.
"AI made security frictionless. And once the friction's gone, staying exposed isn't a constraint — it's a choice."
Budget and expertise. The rest of this is a tool erasing both — in real time.
"I built a defensive platform. Someone with worse intentions and the same $200 could build offensive infrastructure just as easily. AI doesn't check your motives."
Keeping it out was never the right play. It's getting too good, too fast, and everybody else is already using it. The only question left is what you're going to do with it.
Section 02 — The Build
It did not make me a Kubernetes master — sit me at a terminal with no agent and I can't type the
kubectl cold. What it gave me is the shape:
the control plane, the worker nodes, how the cluster scales. I directed and watched. That's not operator
knowledge — it's enough to guide the build and catch the tool when it's wrong, which turns out to be the
only kind of knowledge that matters here. Everything below is verified operational, in its real mode.
New to these tool names? There's a plain-English glossary ↓ that explains every one.
The platform lives in 12 repos on a private Forgejo (commit counts live, June 2026). Eight of them are now published — scrubbed and public — on GitHub under the Sentinel-PoC org: the framework payload plus the real IaC, GitOps, and compliance repos. Links below.
Ansible (56 roles), Terraform, CI pipelines, compliance docs. The backbone.
OKD cluster config, Terraform VMs, agent role definitions.
ArgoCD manifests for 52 apps, Backstage catalog.
OSCAL assessments, automated daily evidence collection.
SSP (763 lines), SAR (1,010 lines), POA&M (606 lines), audit artifacts.
Custom management UI — Python backend + TypeScript/React frontend.
Consulting website.
Developer portal config.
UniFi network API integration, firewall migration.
Agent roles, 11 hooks, Langfuse integration.
Session context cache, platform state snapshots.
Ongoing agent runtime development.
The real platform, published clean — sanitized snapshots (lab IPs → 192.0.2.x, hostnames → example.com, client references removed); the structure, code, and compliance artifacts are real. Start at the org landing page for the guided tour.
Agent operating framework — CLAUDE.md, role definitions, assertion rules. The governance layer that makes autonomous agents auditable.
Documented verification catches: what the Judge caught, what the Worker claimed, what the evidence showed.
Reusable skill files — knowledge packages the agents load on demand.
Runtime guardrails — pre-commit / pre-push hooks + installer that gate every agent commit.
The backbone — Ansible (56 roles), Terraform, CI pipelines, compliance docs.
OKD cluster config, Terraform VMs, agent role definitions.
ArgoCD manifests for 52 apps + Backstage catalog.
OSCAL SSP / SAR / POA&M + automated NIST 800-53 evidence collection.
Section 03 — The Cost
"Cheap" is a feeling; here's the arithmetic. In about a month, $200 of subscription did roughly $9,861 of compute at retail prices — call it ~49× the value. 16.3 billion tokens. 129,318 model calls. One person, part-time, on a consumer plan.
Careful with this number, because it's the kind that gets abused: that's retail-equivalent value, not money that changed hands. The cash was $200. The point isn't "genius investor" — it's leverage: the ratio between what one person can now direct and what it costs them moved by an order of magnitude. Argue the comparison if you want; the ratio is the point.
Security work is labor — not rare materials, not exotic hardware. Skilled, boring, repetitive hours watching logs and checking configs. The small shop never had security because those hours were too expensive to buy. Drop the cost of those hours ~49× and you don't make security a little cheaper — you change who's allowed to have it.
That word in the title — democratized — is literal. Security has always been a luxury good: the more you could spend, the safer you got to be. When doing it right costs $200 instead of $4 million, safety stops being something you buy and becomes something you can just have. The floor didn't drop a notch — it fell out, and everyone who got priced out is suddenly standing inside.
"You can't sit in a boardroom and say you accept the risk because the remediation costs two hundred dollars. That's indefensible. The cost excuse is dead." — Jim Haist
Section 04 — The Receipts
Claude Code transcripts log every model call end-to-end. The cost/usage numbers below cover a ~34-day window (April 30 – June 3, 2026) — the complete local record (every call, including subagents). The commit counts are from the live platform (queried June 2026); the platform has been in active daily development since January 2026.
Cost/usage from Claude Code transcripts, ~34-day window (Apr 30–Jun 3, 2026), priced at Anthropic list rates — $200 of subscription bought ~$9,861 of retail-equivalent compute (~49×). Retail-equivalent, not cash spent. The live commit total reflects ongoing development — still building daily.
Section 05 — The Catch
"It doesn't lie to you — lying takes intent, and it has none. It's probability. It hands you its best guess in the exact same confident voice whether that guess is right or dead wrong — because it can't tell the difference, and at a glance, neither can you. No malice. No tell. That's what makes it dangerous." — Jim Haist
In security that's not a quirk you can shrug off. The whole job is knowing what's true about your own systems — and this tool will, with the exact same confidence it gets things right, hand you a beautiful, completely made-up picture of a system that doesn't exist.
I had the agent set up Kyverno policies. It applied them and reported back: done, policies active, cluster secured. So I did what I always do — "verify it; show me it's actually blocking." And the part that matters: it doesn't verify by re-reading the config it just wrote. It tests against the live cluster. It deployed a pod that violates the policy — and the pod came up clean. Should've been blocked. Wasn't. The policy was in Audit mode, not Enforce. Audit-first is correct — that's how you stage it. The policy wasn't wrong; the report was. One word — "secured" — papering over the gap between what it built and what it claimed.
Veracode gave ~80 coding tasks to 100+ models and scanned the output: about 45% had a known, serious vuln — barely moved in two years, no matter how big or fast the models got. The catch: that's raw output — no security prompt, no review, what the model hands you when you never mention security. That's the before, and you don't ship the before. Veracode's own takeaway: validate it, scan it, review it.
"Here's what I genuinely can't answer: is this platform actually good? That kind of judgment comes from scar tissue I don't have." Compliance docs might say compliant. Wazuh might confirm. Neither validates an architectural decision. If someone at BSides pokes holes in this, it doesn't embarrass me — it proves the thesis in real time.
And here's where half the industry takes the wrong lesson. They hear "45%" and reach for the lock — ban it, wall it off, wait for it to get safe on its own. The data says it isn't getting safer, and everybody else is adopting it anyway. You don't get safety by keeping it out. You get safety by learning to drive it.
"AI didn't make me an expert. It made the expert optional — as long as somebody stays in charge of what's true. You can't let it decide. But you can work with it, and keep the final call for yourself."
The platform engages the NIST 800-53 Rev5 Moderate baseline across all 20 control families. It maintains formal OSCAL SSP, SAR, and POA&M documents (2,379 combined lines). A 105-check automated compliance script runs daily, emitting scored JSON output against the engaged control families.
The platform also runs an independent verification layer outside the AI build chain. During pre-talk audit, that layer caught the platform's own AI-generated compliance docs overstating capabilities — labeling controls as implemented that are not fully operational on the live platform.
Those docs are in active reconciliation. The detailed compliance exhibit — control-by-control, with honest pass/warn/fail/hold status — will be published after remediation is complete.
This incident is the thesis of the talk made literal: AI generated perfect-looking compliance evidence for a platform it helped build. Independent verification caught it. That's why the independent layer exists. That's why it's the technical core of this talk.
Section 06 — Validation
The answer to "how can an autonomous AI agent be safe enough to touch production?" is not "the AI is aligned." It's: the agents don't trust each other, and that distrust is enforced mechanically at the token layer and procedurally at the workflow layer. NIST AC-5 (separation of duties) — applied to AI agents.
Each agent role is its own OS-level user. A SPIRE server attests that Unix identity and issues a JWT-SVID that expires in five minutes (trust domain agents.haist.farm; X509-SVID on a one-hour TTL).
The agent exchanges that JWT with Vault for a scoped token; Forgejo enforces the last wall. sentinel-worker: push, open PRs — cannot merge (403). sentinel-judge: review and merge — can't push. The worker physically cannot approve its own work. Not "shouldn't." Cannot.
SPIRE 1.14.6 — server, agent, OIDC discovery provider running as containers on the IaC host. Enforced at the API layer, not the prompt layer; every exchange in Vault's audit log. The separation is one grep from verifiable.
Worker proposes. Judge verifies independently — different session, different brief, different code path. The Judge re-fetches state independently and does not trust the Worker's claims.
This catches the most dangerous AI failure mode: an agent rationalizing away its own mistake because it's invested in its own work.
Worker added AGENT-STATE.md with git add -f, violating §9 of CLAUDE.md. Judge ran with a clean brief, hit the .gitignore check, refused to merge. A single-actor system would have shipped this silently.
Worker said 'CI green.' Judge pulled the actual CI status and found the green was on the previous commit, not the head SHA. The backstage pipeline had been silently failing for 19 days. Caught by re-query, not by trusting the Worker's text.
A scribe agent wrote a playbook using community.hashi_vault.vault_delete — a module that does not exist in that collection. CI went red. On a passing CI run it would have silently no-op'd on apply. Caught by the CI gate, not by the author.
DR-site geolocation was removed from current files before the public mirror publish. A full-history scan caught it still present in git history. Current-file scope would have shipped it. Caught by widening the scan perimeter.
A compliance scrub covered one subdirectory. Judge re-ran the check repo-wide and bounced it for strings it missed outside that directory. Worker's scope was correct for its brief; the brief was too narrow. Caught only by independent re-check.
Give it a contained place to be wrong where being wrong is cheap — a runtime hook that blocks writes before they execute, an identity that expires in five minutes and only reaches its own lane. Enforced beats hoped-for. The one time I forgot all of it, a model wiped my repos.
The best AI output isn't AI you trust — it's a script you can read. Don't ask the AI to be your nightly security check. Ask it to write the check, once; read it, verify it, then run that — predictable, forever. Move the trust onto the predictable thing.
Never check whether a system's secure by reading the config that swears it is — the paperwork is the thing most likely to be lying. Deploy the thing and watch what happens. Query the live box. Don't ask the file. (The Kyverno catch, generalized.)
I can't personally audit every line those agents wrote — some of it is past my own ceiling. So how do I sleep? Not on faith. On convergence: hundreds of independent agent sessions, different context each time, kept landing on the same answer — this works. That's not proof. It's evidence. And honest evidence I can defend beats a clean claim I can't.
And I'll name the loop instead of hiding it: the agent also wrote a lot of the documentation future sessions read to understand the environment — so some of the time it's checking its own homework. That gap between theory and practice is in every shop running this tool. Equifax had a security department. SolarWinds had a security department. Every big breach of the last decade happened at a place sure its implementation matched its design — and was wrong. The expensive shops have the same gap and a nicer brochure over it. I just got here for $200 instead of $4 million.
A daily service runs ansible-playbook --check --diff across platform roles, emitting results to a log that Wazuh monitors. Ansible scans in check-mode against Git; Wazuh forwards drift alerts. Five of six playbooks are covered.
The sixth — the iac-control role — silently failed for 19 consecutive days (first entry: 2026-05-09T08:00:05Z; confirmed still failing 2026-05-28T08:00:02Z). Pre-talk audit caught it.
"The pattern is real, the gap is real, and the gap proves the pattern works on the other five."
Evidence: /var/log/sentinel/drift-detection.log — 20 consecutive CHECK_FAILED entries. Ansible does the declared-vs-live comparison; Wazuh forwards the alerts. These are distinct roles.
Section 07 — Where It Lives
The platform ran on Gemini 3.0 Pro when the abstract was submitted. It now runs on Claude Opus 4.8. Model advancement plus the worker/judge scaffolding are starting to address the mistake modes the abstract names. Not solve. Starting to address. The scaffolding (token-policy enforcement, worker/judge separation, CLAUDE.md operating framework) is the durable, inspectable answer regardless of which model you run.
CLAUDE.md is the operating framework — agent roles, assertion rules, file ownership, worker/judge separation, the compliance pipeline. One Markdown file. In the repo. Open to inspection. The thesis of this talk isn't "AI is magic" — it's "AI is a tool, and here is what it takes to use it responsibly."
Keycloak is fully decommissioned. Authentik is the live SSO IdP for all apps. Agent-to-Vault auth migrated to SPIRE x509pop: agents attest via kernel-held cert, no bearer token, same SPIFFE ID survives restarts. Proven by restart-survival test — all 4 agent roles (worker/judge/planner/scribe) re-attested without intervention.
The honest current-state map — what's actually operational today, with the SAR control scores that back each one.
Indiana. Three Proxmox hypervisors on one LAN. 3-node OKD 4.21 cluster. All production workloads. The build happened here.
Different county, 30–100mi, separate utility grid. 1G symmetric fiber, natural-gas generator + UPS, UniFi site-to-site VPN. Networking and power are provisioned; compute (a Proxmox node) is deploying — weeks out. SAR: CP-7 Partial (alt-processing compute pending), CP-4(2) Non-Compliant (failover never tested). Gaps are on the POAM.
Geographic offsite. Different facility, utility grid, and geographic region from the primary site. SAR: CP-6 Compliant. Encrypted backup storage.
The proposal I submitted to speak at this conference said the platform has "three physical sites." It does not. One is in production. One is a room with power and network run but the compute still going in — the DR site, offsite backup already wired. A third is planned, not built. My own pitch ran ahead of my own build — present tense for something still in the future. That's the exact drift this talk is about, in my own words, about my own work. The only reason I can correct it clean instead of getting caught in it is the discipline above: I check against reality, so I caught it. In the SAR that honesty is load-bearing — CP-7 Partial (alt-processing compute pending), CP-4(2) Non-Compliant (failover never tested), tracked as Finding F-005, opened during the pre-talk audit.
Section 08 — Glossary
The stack above is a wall of tool names, and most of them mean nothing unless you already live in this world. Here's the plain-English version — what each thing is, in one sentence, no jargon. If a name above lost you, it's defined here.
Section 09 — Close
Let me demystify the thing, because the hype cuts both ways. "AI is a really good dictaphone bolted to a really good encyclopedia." It listens to what you say, and it knows a lot of stuff. Everything else is the implication of those two being very good at the same time. But you still have to be specific — "my security feels wrong" isn't a ticket, it's a vibe. The skill is turning intent into direction.
"AI isn't a nail gun that does the same thing for everyone. It's a paintbrush. Give it to someone with drive and curiosity and you get a security platform. Give it to someone who just wants the ticket closed and you get a closed ticket. Same tool, completely different output. The tool isn't the variable. You are."
"You're the validation layer. That's the scarce skill, and most of you already have it. AI didn't replace it — it removed everything around it. You can look at a config and know it's wrong; read a control and know it's not actually met. That judgment took years to earn — and it was never the bottleneck. The bottleneck was time and money, and that just went to zero."
$400–500k/year + 6–9 months vs. $200/month — in active daily development since January 2026. Production security infrastructure assembled by one IT professional with a Claude subscription and hardware he already owned. The expertise barrier that priced most organizations out of real security posture is collapsing. The cost excuse is dead.
Cost — expertise barrier collapsing
Human friction — no corner-cutting at 2 AM
Confident wrongness — invisible without independent verification
"Every conversation about AI in security is stuck on the wrong question — how do we contain it, keep it out. It's getting better too fast and spreading too wide for that to be a strategy. The only question that's ever moved anything forward is the one you're equipped to answer: what are we going to do with it? The cost excuse is dead. You're the validation layer. Go build something." — Jim Haist, The Close
Figures sourced from internal platform telemetry, commit history, and security audits. Internal network details redacted.