Josh Kappler

I build autonomous
AI agents.

2.1M YouTube subscribers. I build AI agents from scratch — no frameworks, no shortcuts. Looking for an early-stage startup where I can do more of it.

Agent OrchestrationPrompt EngineeringMulti-Provider LLMEvolutionary AITool UseStreamingState MachinesSelf-ValidationKnowledge GraphsVoice + TTSLearning LoopsStructured OutputVideo GenerationAgent OrchestrationPrompt EngineeringMulti-Provider LLMEvolutionary AITool UseStreamingState MachinesSelf-ValidationKnowledge GraphsVoice + TTSLearning LoopsStructured OutputVideo Generation
Agent OrchestrationPrompt EngineeringMulti-Provider LLMEvolutionary AITool UseStreamingState MachinesSelf-ValidationKnowledge GraphsVoice + TTSLearning LoopsStructured OutputVideo GenerationAgent OrchestrationPrompt EngineeringMulti-Provider LLMEvolutionary AITool UseStreamingState MachinesSelf-ValidationKnowledge GraphsVoice + TTSLearning LoopsStructured OutputVideo Generation

01 / YouTube

0.1M

subscribers on YouTube

0M+

Total Views

0+

Videos

0+

Years

I have been creating content on YouTube for over seven years under the name Boffy. I grew the channel from zero to 2.1 million subscribers. No team at first — just figuring out what gets clicks and doing more of it.

Eventually I hired editors and designers, negotiated sponsorships with RedMagic, Wargaming, GeoGuessr, and others, and spent a lot of time in analytics trying to figure out what was actually working.

Running a YouTube channel at this scale is a lot like running a product. You put something out, look at the numbers, adjust, and do it again.

Brand Partnerships
RedMagicWargamingGeoGuessrYouToozFactorGamerSuppsEllify

02 / Projects

What I have built

Everything here was built from scratch — direct SDK calls, no LangChain, no CrewAI, no agent frameworks. Repos are private but I can walk through any of them.

01Autonomous Security Agent

autohack

I built an agent that finds bug bounty programs, hunts for security vulnerabilities using Claude, reviews its own findings, and submits reports on its own. It runs hour-long sessions with 200-turn conversations and gets better over time because it records what worked and what didn't.

TypeScriptAnthropic SDKNext.js 15SQLiteDrizzletRPC

Technical Details

12-state pipeline from discovery through submission across HackerOne, Immunefi, and Huntr
A separate Claude instance validates findings on a 0-15 rubric before anything gets submitted
Records outcomes, near-misses, and triager feedback so future hunts are better
Two Claude backends: CLI mode (Max subscription) or API mode with prompt caching for 90% input token savings
Watchdog process that detects stalled solves and resets state automatically
Real-time dashboard with xterm.js terminal showing live Claude output
02Multi-Agent Simulation Platform

AgentArena

A sandbox where multiple LLM agents interact in open-ended scenarios to see what happens. Has a genetic algorithm that evolves prompt configs over hundreds of runs, and a village sim where agents remember things, form opinions, and build relationships.

TypeScriptNext.js 15Claude Agent SDKGroqOpenRouter

Technical Details

Village mode: 4-8 agents each run a 5-step cognitive pipeline (perception, action selection, reflection, planning, state resolution)
3-tier memory system: episodic with embeddings, semantic from reflection, and a knowledge graph tracking who knows what
Genetic optimizer that evolves prompt configs across hundreds of scenarios, scored by an LLM judge on 5 dimensions
Agents have emotional state (grief, fear, anger, hope, loneliness) that changes how they act
A DM orchestrator that watches tension metrics and creates dramatic events when things get too calm
Multi-provider LLM client routing to Claude, Groq, or OpenRouter through a single interface
03Directed Evolution Engine

genisis

I wanted to see if I could breed better AI agent prompts instead of writing them by hand. This system mutates traits, runs agents through behavioral scenarios, and selects winners through natural selection. The agents end up developing behavior they were never explicitly told to have.

PythonOllamaGemma 4PydanticTextual TUI

Technical Details

~20 modular traits across 5 psychological categories that get assembled into system prompts
4 mutation operators: tweak (60%), swap (20%), delete (10%), duplicate+drift (10%)
Two-phase evaluation to prevent gaming: binary pass/fail gates, then anonymous head-to-head ranking
4 behavioral scenarios testing threat detection, manipulation resistance, moral reasoning, and opportunism
Trait pool that accumulates proven traits from past champions for future mutations
Runs entirely local on Ollama with no cloud dependencies
04Autonomous Bounty Solver

algora

This agent finds GitHub bounties with financial rewards, figures out which ones are worth attempting, and solves them using Claude Code. It clones repos, implements fixes, runs tests, creates PRs, and responds to reviewer feedback.

TypeScriptpnpm monorepoAnthropic SDKNext.js 15DrizzletRPC

Technical Details

Spawns Claude Code with up to 100 turns and a 45-minute timeout per solve
Priority scoring: (reward * feasibility) / (1 + competitors) to pick the best bounties to attempt
Sends specific lint/test errors back to Claude for targeted fixes, up to 3 retry attempts
Monitors PRs for review comments, generates responses, and pushes code fixes based on feedback
Classifies errors as transient, permanent, validation, or timeout to decide whether to retry
Tracks success rates by language, repo, and failure pattern so it picks better bounties over time
05AI Video Generation

ContentPipeline

A pipeline that produces YouTube Shorts from start to finish. Claude writes the script, image and video providers generate the visuals, TTS handles narration, and ffmpeg assembles the final video with word-level subtitles synced to the audio.

PythonAnthropic SDKGeminiImagen 4faster-whisperffmpeg

Technical Details

Claude generates structured story JSON via tool_use, then writes image prompts for each scene
Word-level subtitle sync using faster-whisper for timestamps and difflib to align back to the original script
Swappable provider backends: Gemini/Imagen for images, Motion/Kling/Veo for video, Gemini/Edge/ElevenLabs for TTS
Retries image generation up to 4 times when safety filters reject a prompt, with exponential backoff
Python picks the comedy concept randomly, then Claude writes a unique story around it
06Local AI Chat with Voice

chadGPT

A chat app that runs entirely on local models through Ollama. It has voice input via Whisper, multiple TTS backends (Gemini, Edge, ElevenLabs), and configurable AI personalities. The Node launcher handles spinning up Ollama and the Python venv automatically.

PythonFastAPIOllamaWhisperElevenLabsJavaScript

Technical Details

Runs local LLMs through Ollama with automatic process management and port detection
3 swappable TTS backends: Gemini, Edge TTS, and ElevenLabs
Whisper-based voice input with word-level timestamps
Node.js launcher that creates the Python venv, installs deps, and starts the FastAPI server in one command
Configurable AI personalities with persistent conversation history
07Full-Stack SaaS Marketplace

sniply.biz

Visit

A live marketplace where people find and book barbers and stylists. I built and deployed the whole thing — matching algorithm, booking system, messaging, auth, all of it.

Next.js 16TypeScriptPostgreSQLTailwindVitestPlaywright

Technical Details

Match scoring algorithm weighing hair type compatibility (40%) and style preference overlap (60%)
PostgreSQL advisory locks to prevent double-booking race conditions
Haversine distance calculation for geographic search
Session auth with HMAC-signed cookies and rate-limited login (5 attempts per IP per 15 min)
22+ seed professionals with deterministic availability generation for testing

03 / About

How I got here

I grew a YouTube channel to 2.1 million subscribers over seven years. No team at the start, no playbook. I just figured out what worked and did more of it.

Now I build AI agents. Same approach — pick a hard problem, build the whole thing from scratch, make it work. Seven AI systems in the past several months, all solo, all direct SDK calls, no frameworks.

I studied CS at the University of Oregon with a specialization in AI and machine learning. I was a top student, but the curriculum was mostly theoretical, focused on weights, layers, and statistics without much applied usage. The things I work with now, agent orchestration, tool use, prompt engineering, multi-model pipelines, did not exist in any syllabus. So I left and learned by building.

What I work with

TypeScriptPythonNext.jsReactNode.jsSQLitePostgreSQLDrizzle ORMtRPCZodPydanticAnthropic SDKClaude Agent SDKClaude CodeGroqOpenRouterOllamaWhisperFastAPIPlaywrightffmpeg

How I build

  • Direct SDK calls — no LangChain, no CrewAI
  • Claude Code as primary dev tool
  • Multi-provider LLM routing (Claude, Groq, OpenRouter, Ollama)
  • Full-stack: backend, frontend, dashboards, deployment
  • State machines for agent lifecycle management
  • Recording outcomes and feeding them back into future runs

04 / Contact

Get in touch.

I am looking for founding engineer and AI agent roles at early-stage startups in the Bay Area. If you are building something interesting, I want to hear about it.

Josh Kappler · 2026