Josh Kappler

I build autonomous
AI agents.

Nine shipped AI agent systems, all solo. I write the orchestration layer myself. Tool loops, state machines, memory, multi-provider routing. No LangChain, no CrewAI. Also grew a YouTube channel to 2.1M subscribers, which is where I learned to stick with long, messy projects until they work.

01 / Projects

What I have built

Everything here was built from scratch. I write the orchestration layer myself. No LangChain, no CrewAI, no agent frameworks.

01Deal Analysis + Investment Memo Platform

memo-engine

Built for a private credit investment firm under NDA. Ingests messy deal documents (PDFs, Excel models, Word drafts, .msg emails) and produces institutional-grade investment memos through a six-step human-in-the-loop pipeline. Every claim in the output is cited back to the source file and location. Analysts iterate with the AI through a chat review gate, with voice input via Whisper, before advancing to the next stage.

Next.js 16TypeScriptAnthropic SDKPostgrespgvectorVoyage AIVercel Workflow DevKit

Technical Details

Contextual retrieval: per-chunk Sonnet 4.6 prefixes run over the full document, with the first 400K chars cached via ephemeral prompt caching so every call reads at the $0.30/M cached rate
Voyage AI voyage-3 embeddings (1024-dim) batched by byte budget (≤400KB, ≤96 items) to respect the 320K-token-per-batch cap on dense financial text
Forced tool_use with Zod-to-JSON-schema for ~40-field structured extraction: credit snapshot, capital structure, financials, covenants, management, comps, scenarios
Durable pipeline orchestration via Vercel Workflow DevKit — parse, analysis, research, internal memo, and external memo each run as a step with its own 800s budget
Multi-format export: PDF via @sparticuz/chromium + puppeteer-core (Vercel-compatible headless Chromium), Excel with ExcelJS formulas and sensitivity tables, DOCX, ZIP bundle
Multi-user auth with bcrypt cost 12 and JWT sessions via jose, admin approval gate, with a legacy shared-password fallback for continuity
02Autonomous Security Agent

autohack

GitHub

A 5-package TypeScript monorepo that polls four bounty platforms, spawns hour-long Claude sessions to hunt for vulnerabilities, validates its own findings through adversarial review, and submits reports without human intervention. A separate Sonnet pass compresses verbose findings before submission. The system writes hunt outcomes, near-misses, and triager feedback to a JSON memory store so every future session starts with context from every past one.

TypeScriptAnthropic SDKNext.js 15SQLiteDrizzletRPC

Technical Details

12-state finding lifecycle from discovery through submission across HackerOne, Immunefi, Huntr, and an aggregator covering Bugcrowd, Intigriti, and YesWeHack
Adversarial review: a separate Claude instance scores findings on a 0-15 binary rubric. Anything below 8 is rejected before it reaches a triager
Ephemeral prompt caching cuts input tokens by roughly 90% across repeated hunt sessions, with a local backend fallback for development
Cross-process coordination via lock files, shared runtime-override JSON with a 2-second TTL cache, and stale-PID detection on startup
Error classification (transient, permanent, validation, timeout) decides whether to retry, skip, or kill the hunt
Real-time tRPC dashboard with xterm.js terminal streaming live Claude tool calls and reasoning
03Autonomous Lead-Finding Pipeline

property-leads

Built for a private real-estate cash buyer flipping foreclosure houses. An hourly Vercel cron triggers an orchestrator agent that plans which scrapers to run, then per-property research and scoring agents produce a maximum-allowable-offer with reasoning, a rehab band, and a 0-100 lead tier. Outreach drafts go through a writer agent and a separate reviewer before Resend dispatch. Runs at $0.22 for a 33-property cycle.

Next.js 16Anthropic SDKNeon PostgresDrizzleApifyResendLeaflet

Technical Details

4-stage agent pipeline with tiered models: Haiku for orchestration and the outreach reviewer, Sonnet for research, scoring, and draft writing
Research agent folds FEMA flood zones and municipal violation and permit data into a single MAO with cited reasoning per property
Scoring returns 0-100 with hot/warm/cold tiering and a breakdown so an analyst can disagree with the model in one read
Outreach has a Sonnet drafter and a separate Haiku reviewer that can block or rewrite a draft before it reaches Resend. emailPolicy defaults to off so test runs never blast
Scheduling is three knobs on a versioned config row: pause, interval in minutes, and time-of-day with IANA timezone. Vercel cron fires hourly and the route gates itself
Idempotent ALTER TABLE migration runner, fingerprint-based dedup across runs, Nominatim geocoding queue with a hard 1 req/sec rate limit

02 / YouTube

0.1M

subscribers on YouTube

0M+

Total Views

0+

Videos

0+

Years

I have been creating content on YouTube for over seven years under the name Boffy. I grew the channel from zero to 2.1 million subscribers. No team at first. I just figured out what gets clicks and did more of it.

Eventually I hired editors and designers, negotiated sponsorships with RedMagic, Wargaming, GeoGuessr, and others, and spent a lot of time in analytics trying to figure out what was actually working.

Running a YouTube channel at this scale is a lot like running a product. You put something out, look at the numbers, adjust, and do it again.

Brand Partnerships
RedMagicWargamingGeoGuessrYouToozFactorGamerSuppsEllify

03 / About

How I got here

I build AI agents from scratch. I write the orchestration layer myself. Tool-use loops, state machines, memory management, multi-provider routing. Every system in the project list was built solo, no LangChain, no CrewAI, no agent frameworks.

Before this I spent seven years growing a YouTube channel from zero to 2.1 million subscribers. That is where I learned to follow through on long, unglamorous projects and manage something that kept getting bigger and more complex. Same instinct, different medium.

What I work with

TypeScriptPythonNext.jsPostgreSQLSQLiteZodPydanticAnthropic SDKGroqOpenRouter

How I build

  • Hand-rolled orchestration, no LangChain, no CrewAI
  • Claude Code as primary dev tool
  • Multi-provider LLM routing (Claude, Groq, OpenRouter, Ollama)
  • Full-stack: backend, frontend, dashboards, deployment
  • State machines for agent lifecycle management
  • Recording outcomes and feeding them back into future runs

04 / Contact

Get in touch.

I am looking for AI agent engineering roles, or other AI-adjacent work, at early-stage startups in the Bay Area. If you are building something interesting, I want to hear about it.

Josh Kappler · 2026