You recorded a 45-minute tutorial. Somewhere in that recording are three or four moments that would make excellent Shorts. The problem: finding those moments requires watching the entire video. AI changes this equation completely.
How AI Identifies "Good Parts"
AI-based clip selection works by analyzing multiple signals simultaneously:
| Signal | What It Detects | Why It Matters |
|---|---|---|
| Speech energy | Moments of emphasis, excitement | High-energy speech correlates with engaging content |
| Silence gaps | Natural segment boundaries | Silences indicate topic transitions -- good clip boundaries |
| Visual change rate | Rapid screen changes, demos | Active demonstrations are more interesting than static slides |
| Transcript keywords | "Watch this," "the trick is," "here is how" | Verbal cues that signal valuable moments |
| Code changes | Active typing, new code appearing | Coding moments are the most shareable for dev content |
The Extraction Pipeline
Once AI identifies candidate moments, the extraction pipeline runs:
- Score each candidate moment (0-100) based on combined signals
- Filter to moments scoring above your threshold (I use 70)
- Expand each moment to include 2 seconds of lead-in and 1 second of lead-out
- Crop to 9:16 vertical format, centering on the active area of the screen
- Add captions from the transcript
- Render each clip as a standalone Short
Cropping Strategy for Screen Recordings
Horizontal screen recordings do not fit vertical Shorts without cropping. The naive approach -- center crop -- often misses the relevant part of the screen. Better: use OCR and mouse position tracking to identify where the action is happening, then crop around that region.
ffmpeg -i input.mp4 -vf "crop=608:1080:656:0" -c:a copy short-clip.mp4
That FFmpeg command crops a 608x1080 region from a 1920x1080 source. The 656:0 offset positions the crop window. AI determines this offset by analyzing where the code editor or terminal is active.
VidNo's Clip Detection
VidNo combines OCR analysis with git diff detection to find the most meaningful moments in developer recordings. When it sees a code change that corresponds to a working feature or a bug fix, it marks that as a high-value clip. This is more precise than generic engagement signals because it understands what developers actually care about seeing.
Quality Over Quantity
Resist the temptation to extract every possible clip. A 45-minute video might yield 15 candidate moments, but only 3-5 will genuinely work as standalone Shorts. Each Short needs to make sense without context, deliver value in under 60 seconds, and end with a satisfying conclusion. AI can score candidates, but review the top picks before publishing.