Give Claude Eyes: How to Use /watch to Analyze Any Video

Stop guessing what's in a video. Learn how to use the /watch skill to let Claude download, transcribe, and visually analyze any video file or URL.

For all the power of modern LLMs, they have historically been blind to one of our most common information sources: video. If you wanted to know what happened in a 20-minute YouTube tutorial or a screen recording of a bug, you had to watch it yourself or rely on inaccurate, auto-generated transcripts that miss all the visual context.

claude-video changes this by giving Claude a new capability: the /watch command. This tool bridges the gap between raw video files and Claude’s multimodal reasoning engine.

How It Works: Under the Hood

The /watch skill is a sophisticated orchestration layer that automates the heavy lifting of video processing:

  1. Ingestion: It uses yt-dlp to fetch content from virtually any source (YouTube, TikTok, Loom, X, etc.) or accepts local file paths.
  2. Frame Extraction: It uses ffmpeg to sample frames from the video. Crucially, it uses an auto-scaled frame budget based on video duration. For example, a 30-second clip gets ~30 frames, while a 10-minute video gets a sparse scan of 100 frames to keep token usage efficient.
  3. Transcription: It prioritizes native captions (free and fast). If none exist, it falls back to Whisper (via Groq or OpenAI) to generate a timestamped transcript.
  4. Multimodal Synthesis: It packages the frames and the transcript into a context window that Claude can "see" and "hear," allowing it to answer questions grounded in the actual visual and auditory data.

Why Developers Need This

Beyond just summarizing videos, this tool solves several high-friction developer workflows:

  • Bug Reproduction: Instead of asking a user to describe a bug, have them send a screen recording. Run /watch bug-repro.mov "What is the UI state when the crash occurs?" and let Claude pinpoint the exact frame where the error triggers.
  • Content Engineering: Analyze viral hooks or competitor ad creative. You can ask, "What is on screen during the first 3 seconds of this video?" to reverse-engineer successful content structures.
  • Deep-Dive Research: Instead of watching a 30-minute technical talk at 2x speed, use /watch to extract the key moments, code snippets shown on screen, and the core arguments made by the speaker.

Getting Started

Installation is flexible depending on your environment:

For Claude Code users:

/plugin marketplace add bradautomates/claude-video
/plugin install watch@claude-video

For web users: Download the watch.skill file from the GitHub releases page and add it via Settings → Capabilities → Skills. Ensure "Code execution" is enabled.

Pro-Tips for Efficiency

Because image tokens are expensive, the tool includes "focused mode" to save your budget:

  • Use --start and --end flags: If you only care about a specific segment, define it. This increases the frame density for that specific window, giving you much higher accuracy for that section without wasting tokens on the rest of the video.
  • Adjust Resolution: If the video contains small text (like a terminal or code editor), use --resolution 1024 to ensure Claude can read the on-screen details clearly.
  • Whisper Backend: If you are processing many videos, use the Groq API for Whisper; it is significantly faster and cheaper than the standard OpenAI path.

Limitations to Keep in Mind

  • The 10-Minute Rule: While it can process longer videos, accuracy is highest under 10 minutes. For longer content, use the --start and --end flags to break the analysis into manageable chunks.
  • No Auth: This tool does not handle private, authenticated video streams. It works best with public URLs and local files.

By turning video into a searchable, queryable data source, /watch transforms Claude from a text-based assistant into a true multimodal analyst. Whether you're debugging or researching, it's a must-have addition to your AI toolkit.

Source

bradautomates/claude-video: Give Claude the ability to watch any video. /watch downloads, extracts frames, transcribes, hands it all to Claude.