yolodex — Stephen Hung

about.

openai codex skill repo turning videos into fully trained object detection models. yolo + codex.

challenge.

a youtube video to a trained YOLO model, autonomously

yolodex is an openai codex skill that takes a video url and a class list and outputs production-ready YOLO weights. point it at a clip of subway surfers, say "detect player, train, coins," and 30 minutes later you've got a model you can drop into prod.

the pipeline

.agents/skills/ exposes 5 discrete steps the codex runtime orchestrates:

collect — yt-dlp pulls the video, ffmpeg extracts frames at the rate yolo needs
label — dispatches subagents in parallel git worktrees to label classes per-frame
augment — pillow runs a deterministic augmentation set (rotation, hue jitter, etc.)
train — ultralytics YOLOv8 with config-driven epoch counts
eval — mAP@50 scoring against a held-out split, with a fail-fast threshold

yolodex-run.sh runs the autonomous loop: train → eval → if accuracy < target, regenerate labels and retrain.

why parallel git worktrees

vision dataset labeling at scale is the bottleneck. companies pay $100k+/yr for manual annotation. naive llm labeling is sequential and slow. yolodex spawns subagents in isolated git worktrees — each agent labels its own subset of frames, commits to its own branch, and the orchestrator merges results.

worktrees give you fork-level isolation without the spin-up cost of containers. dispatch 4 agents in parallel and labeling finishes in roughly 1/4 the time. the orchestration runs on top of the codex sdk so it inherits the model orchestration primitives directly.

the demo

at the openai codex hackathon (feb 2026): "we're 19-year-old founders building opal, an ai gaming companion you can play alongside. but before an agent can play, it has to see — and training vision usually means manually labeling thousands of frames, costing companies $100k+ a year. so we supercharged codex to automate that entire workflow, for us and any team building vision-powered ai."

what shipped

top 5 finalist at the openai codex hackathon, $10,000 prize, presented to sam altman and greg brockman. team: joshua lin, philip chen, ryan ni (the ucsd goats). open-source skill repo, runs locally with uv package manager.

stack.

YOLOOpenAI Codex

live →repo →