Dead time is the number one reason raw screen recordings are unwatchable. A typical 30-minute coding session contains 10-15 minutes of dead time: waiting for builds, reading documentation, thinking with the cursor idle, scrolling through search results. Removing it manually takes hours. AI does it in seconds.

What Counts as "Dead Time"

Not all inactivity is dead time. VidNo classifies screen segments using multiple signals:

Definite Dead Time (always cut)

  • Long pauses with no activity: Cursor idle, no typing, no scrolling for 10+ seconds
  • Build/compile waiting: Terminal showing a progress bar or compilation output with no developer interaction
  • File browser navigation: Clicking through folders to find the right file (the result matters, the navigation does not)
  • Repeated trial-and-error: The same 5 attempts to fix a typo -- only the final fix is kept

Context-Dependent (sometimes kept)

  • Reading documentation: If the documentation directly informs the next code change, a compressed version is kept with narration explaining what was referenced
  • Debugging: If the debugging process is educational (demonstrates a technique), it is kept. If it is just repeatedly running the same command, it is cut.
  • Thinking pauses: A short pause before a key decision can be compressed to 2-3 seconds with narration like "At this point, I needed to decide between..."

Not Dead Time (never cut)

  • Writing code: Every line of code typed is analyzed and potentially narrated
  • Running tests: Test output is valuable context, especially when tests pass after changes
  • Terminal commands: Commands show workflow and tool usage
  • Application switching that matters: Moving from editor to browser to verify the result

How VidNo Detects Dead Time

VidNo uses a multi-signal approach rather than simple silence detection:

  1. Frame differencing: Consecutive frames are compared. If less than 2% of pixels change over 5+ seconds, the segment is flagged as potentially dead.
  2. OCR change tracking: The text content of each frame is compared. No text changes = no meaningful action.
  3. Cursor tracking: Cursor position and movement patterns distinguish active coding from idle staring.
  4. Audio analysis: If you were narrating live, silence indicates dead time. If not, audio is not weighted heavily.
  5. Context awareness: VidNo knows that a terminal showing "running tests..." is a build process, not active development.

Dead Time Removal vs Silence Removal

Tools like Gling and Descript focus on silence removal -- cutting audio gaps. This works for podcast-style content but fails for screen recordings because:

Stop editing. Start shipping.

VidNo turns your coding sessions into YouTube videos — scripted, edited, thumbnailed, and uploaded. Shorts included. One command.

Try VidNo Free
  • Coding is mostly silent. Removing all silence from a screen recording removes most of the recording.
  • Some silence is meaningful (thinking before a key decision, reading test output).
  • The visual content matters more than the audio. A developer typing code in silence is not "dead time."

VidNo's approach is visual dead time detection, not audio silence detection. It understands that a silent screen with active typing is productive footage, while a screen showing a loading spinner with narration is dead time.

Configuring Sensitivity

# Default: 0.6 (balanced)
vidno config set cut-sensitivity 0.6

# Aggressive: cuts more dead time (tighter video)
vidno config set cut-sensitivity 0.8

# Conservative: keeps more footage (more complete)
vidno config set cut-sensitivity 0.3

For most tutorials, the default 0.6 sensitivity produces good results. Increase it for longer sessions (45+ minutes) where you want a tighter output. Decrease it for walkthroughs where you want to preserve more of the process.

Before and After

A typical 30-minute recording processed with default settings:

  • Input: 30 minutes of raw footage
  • Dead time detected: 12 minutes (40%)
  • Compressed pauses: 4 minutes reduced to 45 seconds
  • Final tutorial: 14 minutes of dense, narrated content

The resulting video feels purposeful. Every second either shows meaningful code changes or provides context through narration. There are no "uh, let me think about this" moments or 30-second loading screens.

For more on VidNo's editing pipeline, see how FFmpeg powers VidNo's editing. To understand the full pipeline, read how VidNo works.