openai codex skill repo turning videos into fully trained object detection models. yolo + codex.
a youtube video to a trained YOLO model, autonomously
yolodex is an openai codex skill that takes a video url and a class list and outputs production-ready YOLO weights. point it at a clip of subway surfers, say "detect player, train, coins," and 30 minutes later you've got a model you can drop into prod.
the pipeline
.agents/skills/ exposes 5 discrete steps the codex runtime orchestrates:
- collect —
yt-dlppulls the video, ffmpeg extracts frames at the rate yolo needs - label — dispatches subagents in parallel git worktrees to label classes per-frame
- augment — pillow runs a deterministic augmentation set (rotation, hue jitter, etc.)
- train — ultralytics YOLOv8 with config-driven epoch counts
- eval — mAP@50 scoring against a held-out split, with a fail-fast threshold
yolodex-run.sh runs the autonomous loop: train → eval → if accuracy < target, regenerate labels and retrain.
why parallel git worktrees
vision dataset labeling at scale is the bottleneck. companies pay $100k+/yr for manual annotation. naive llm labeling is sequential and slow. yolodex spawns subagents in isolated git worktrees — each agent labels its own subset of frames, commits to its own branch, and the orchestrator merges results.
worktrees give you fork-level isolation without the spin-up cost of containers. dispatch 4 agents in parallel and labeling finishes in roughly 1/4 the time. the orchestration runs on top of the codex sdk so it inherits the model orchestration primitives directly.
the demo
at the openai codex hackathon (feb 2026): "we're 19-year-old founders building opal, an ai gaming companion you can play alongside. but before an agent can play, it has to see — and training vision usually means manually labeling thousands of frames, costing companies $100k+ a year. so we supercharged codex to automate that entire workflow, for us and any team building vision-powered ai."
what shipped
top 5 finalist at the openai codex hackathon, $10,000 prize, presented to sam altman and greg brockman. team: joshua lin, philip chen, ryan ni (the ucsd goats). open-source skill repo, runs locally with uv package manager.






