Processing time depends on three factors: your GPU, the length of your recording, and the complexity of what is on screen. The general rule of thumb is a 1:3 ratio on a recommended GPU — a 30-minute recording takes about 10 minutes to process into three output videos.
Here is a more detailed breakdown of where time is spent in the pipeline:
Frame extraction and OCR takes about 10-15% of total processing time. VidNo extracts frames at regular intervals and runs optical character recognition on each one to read code, terminal output, and UI text. Longer recordings with more text-dense screens (multiple editor splits, small fonts, lots of terminal output) take proportionally longer in this phase.
Script generation via Claude API takes 30-60 seconds regardless of recording length. This is the fastest step and the only one that requires an internet connection. Claude processes the extracted context and generates scripts for all four output videos in a single API call.
Voice synthesis takes about 35-45% of total processing time. This is the most GPU-intensive phase. MOSS TTS generates speech for each of the four scripts sequentially. The full tutorial script is the longest and takes the most time. On an RTX 4090, voice synthesis runs at roughly 2x real-time speed — a 15-minute script synthesizes in about 7-8 minutes.
Video compositing and rendering takes about 25-30% of total processing time. VidNo makes intelligent cuts, adds transitions, syncs voiceover to screen content, and renders the final MP4 files for all four outputs. This phase uses both GPU (for hardware-accelerated encoding) and CPU (for compositing logic).
Thumbnail generation takes about 5% of total processing time. VidNo creates custom thumbnails for each video using key frames from the recording combined with text overlays optimized for YouTube click-through rates.
YouTube upload takes about 5-10% of total processing time depending on your internet connection speed and the total file sizes. VidNo uploads all four videos via the YouTube Data API, setting titles, descriptions, tags, chapters, thumbnails, and scheduling for each. This runs at the end of the pipeline so your local processing is not bottlenecked by upload bandwidth.
Approximate total processing times for a 30-minute recording (excluding upload):
RTX 3060 12GB — 35-45 minutes. RTX 4060 Ti 16GB — 20-25 minutes. RTX 4070 Ti Super 16GB — 12-15 minutes. RTX 4080 Super 16GB — 10-12 minutes. RTX 4090 24GB — 8-10 minutes.
Add upload time on top of these estimates — typically 5-15 minutes depending on your connection speed and total output size. You can continue using your computer during processing, but avoid GPU-heavy tasks. VidNo runs at lower priority to minimize interference with your normal workflow.