Practical AI Workshops for Software Teams

Cut through AI noise. Ship engineering, not party tricks.

Updates

Your team is already using coding agents every day - Copilot, Claude Code, Cursor, Codex. Individual engineers are getting better, unevenly. But the practice around the practice is missing: specs don't hold up under non-determinism, there are no evals to catch regressions, and there's no shared model of what to delegate and what to own.

The result: lots of code, unclear quality signals, and no honest answer for leadership on whether AI is actually helping.

These three workshops close that gap. They're grounded in the XP and Agile fundamentals that ship reliable software for a living, re-applied to a technology where the thing you're building is itself unpredictable. No 10x promises. Durable practice that earns trust.

Who these workshops are for

Working software teams - mid-level, senior, and staff engineers; tech leads; engineering managers; product and design partners who pair with engineering. Mixed-seniority cohorts are welcome. The sweet spot is 6–16 attendees.

Not the right fit: absolute beginners, developers who haven't yet shipped production code, and non-technical executives - a separate briefing format suits them better.

Prerequisites: comfortable reading Python (or another mainstream language); have used an LLM coding assistant for a few months; know HTTP, JSON, git, and a terminal. Combo 3 additionally assumes Python asyncio and basic TypeScript + React.

What every attendee walks out with

Four durable takeaways from every workshop:

01 · Mental Model: A clear picture of what an agent actually is

Loop, context, tools, LLM - not a slide, not a metaphor. The four pieces and how they interact, named the same way across the team.

02 · Durable Practices: Spec patterns, evals, delegation, review loops

The engineering discipline that turns agent output into something a team can trust - and that survives staff churn and model upgrades.

03 · Concrete Artefacts: Working agent, rewritten spec, team playbook

You leave with built things - a small Python agent, a real backlog item rewritten with teeth, or a production-readiness checklist for your stack.

04 · A Point of View: Honest framing for leadership

Grounded in engineering fundamentals, with the research numbers - enough to brief executives without hype or fatalism.

The Three Combos

The workshops are designed as three standalone combos that can be taken individually or combined.

Combo 01 · Agentic Engineering for Teams (Intro)

One-day workshop · R 31,500 for 6–10 attendees · +R 2,300 per extra attendee

Lightweight and practice-first. Builds a shared mental model of what an agent actually is - loop, context, tools, LLM - and applies it to the work the team is already doing.

Right for: any team using coding agents daily but lacking a shared model for how.

You'll leave able to:

Explain the agent loop - LLM, context, tools, observation - without a slide
Build a tiny Python agent and run it locally
Sort tickets into delegate / delegate-review / own yourself with confidence
Rewrite a user story to include performance thresholds, graceful degradation, and failure modes
Name the perception vs reality productivity gap, with the numbers to back it up

Topics covered:

Opening framing - vibe coding to agentic engineering, with the honest research numbers
Live build of a ~150-line Python agent, mirroring Thorsten Ball's canonical reference
The delegation framework, applied to your actual sprint tickets
Specs with teeth - rewriting a real backlog item together
The 5-step hypothesis cadence, and context discipline

Combo 02 · Advanced Agentic Engineering

Three-day workshop (recommended) · R 91,000 for 6–10 attendees · +R 6,600 per extra attendee Also available as two-day · R 63,000 · +R 4,550 per extra attendee

Specs, evals, review loops, team patterns, governance - the engineering discipline that turns agents into something you can trust in production.

Right for: engineering teams with at least one senior engineer in the room, past the "we use Copilot" phase, running into real questions - how to review AI-generated code at scale, keep agents honest in production, and not get hacked.

You'll leave able to:

Write full AI user stories with measurable acceptance criteria - and turn those criteria into running evals
Stand up a basic CI/CD/CE pipeline where a failing eval blocks a merge
Apply the plan/execute split and the writer/reviewer loop in team exercises
Use Claude Code sub-agents (context isolation) and agent teams (cross-session coordination) appropriately - they're not the same thing
Produce a first-draft agent inventory and least-privilege access matrix
Brief leadership honesty on AI productivity

Topics covered:

Lightweight spec-driven development via a Claude Code skill + CLAUDE.md; SpecKit shown as the cross-vendor alternative
Hands-on evals with DeepEval, wired into a CI pipeline that blocks a regressing commit
Writer/reviewer loop exercise; plan/execute split across Opus and Haiku
Multi-agent patterns (sub-agents, agent teams, MCP) with the CooperBench caution - multi-agent is not automatically better
Team-shape workshop: conductor model, centaur pod, evolving role definitions
Security, governance, shadow-agent inventory; MCP as the cross-vendor standard

Combo 03 · Building Agentic Backends & Chat UIs

Two-day workshop · R 63,000 for 6–10 attendees · +R 4,550 per extra attendee

From "I've called an LLM API" to a deployed agentic backend with a streaming chat UI, evals, tracing, and guardrails.

Right for: backend and full-stack engineers embedding AI features into a product. Python + TypeScript (React + Vite) comfort assumed.

You'll leave able to:

Implement an agent loop from scratch in Python - no framework
Design tool schemas an LLM uses reliably; handle parallel tool calls and strict mode
Stream tool-call activity and text to a React chat UI (SSE)
Integrate MCP - consuming an existing server, and exposing your own tools
Use prompt and context caching with awareness of TTL and invalidation
Build evals covering tool-use correctness, task success, and catastrophic failures
Instrument tracing and debug a broken agent run from a trace
Mitigate prompt injection, runaway loops, tool misuse, and data exfil
Deploy a containerised agent and reason about session state, concurrency, and cost

Topics covered:

Agent loop from scratch - "an LLM, a loop, and enough tokens"
Streaming backend in FastAPI; chat UI in Vite + React rendering tool-call activity
MCP (spec 2025-11-25): primitives plus Elicitation as a first-class client capability
Anthropic prompt caching vs Gemini context caching - the cost-shape comparison
DeepEval in CI; Opik for production tracing; Inspect AI for trajectory evals at scale
Two running-artefact tracks - codebase Q&A + stubbed PR drafting, or helpdesk with escalation - chosen to fit your domain

Delivery & logistics

Format: On-site preferred. Remote with working video and unmuted discussion is fine.

Cohort size: 6–16 attendees. Larger cohorts need a second facilitator. On-site cohorts can go up to 20.

Customisation: Every combo includes a pre-engagement call to tune examples to your stack and pick the right running-artefact track for Combo 3.

Pricing & terms

All pricing is in South African Rand, exclusive of VAT. Base rate covers 6–10 attendees.

Deposit: 40% on booking. Balance due within 14 days after delivery.
Early-pay discount: 10% off when the full fee is paid in advance.
On-site travel: Travel and accommodation billed at cost, agreed up front.
Quotes valid for 60 days from the date of issue.

Let's talk

A 30-minute call is usually enough to scope the right combo for your team. Tell us how your team uses coding agents today and where it's breaking — we'll come back with a tailored combo and a date.

workshops@octoco.ai · view the full PDF

Ground truth over hype. Fundamentals over fashion. Working software over wishful thinking.