One recording session. Five published assets. Zero manual editing. That is the promise of a fully automated video pipeline, and it is achievable today with the right architecture.
The Five Outputs
- Full tutorial video (10-20 minutes, 16:9, narrated and edited)
- Recap video (2-3 minutes, 16:9, highlights only)
- YouTube Short (under 60 seconds, 9:16, captioned)
- Custom thumbnail (1280x720 JPG with text overlay)
- Metadata package (title, description, tags, chapters, hashtags)
Pipeline Stages
Stage 1: Capture
Record your screen while you work. OBS, SimpleScreenRecorder, or any tool that outputs MP4. Do not worry about mistakes or pacing -- the pipeline handles cleanup. Just narrate what you are doing as you do it.
Stage 2: Analysis
The pipeline ingests the raw recording and runs three parallel analyses:
- OCR: Extract text from every frame to understand what is on screen
- Transcription: Convert your narration to text with timestamps
- Git diff detection: Identify code changes happening in the recording
Stage 3: Script Generation
An LLM receives the OCR output, transcript, and git diffs. It generates a polished narration script that accurately describes what happens on screen. It also identifies the best moment for a Short and writes a summary for the recap video.
Stage 4: Audio Production
Voice cloning synthesizes the narration script in your voice. The output is timed to match the screen recording, with pauses inserted where the viewer needs time to read code on screen.
Stage 5: Visual Production
FFmpeg assembles the final outputs:
# Full tutorial
ffmpeg -i screen.mp4 -i narration.wav -filter_complex "..." -output tutorial.mp4
# Recap (selected segments concatenated)
ffmpeg -f concat -i recap_segments.txt -output recap.mp4
# Short (cropped and captioned)
ffmpeg -i screen.mp4 -vf "crop=608:1080:656:0,subtitles=captions.srt" -t 58 short.mp4
# Thumbnail (frame extraction + text overlay)
ffmpeg -ss 120 -i screen.mp4 -frames:v 1 thumb_base.jpg
convert thumb_base.jpg -gravity center -annotate +0+0 "Title" thumbnail.jpg
Stage 6: Metadata and Upload
The LLM generates SEO-optimized title, description with chapters, tags, and hashtags. The upload module publishes the tutorial and Short to YouTube with appropriate scheduling.
VidNo Implements This Pipeline
This is VidNo's core pipeline. Every stage described above is automated and runs locally on your machine. The input is a screen recording. The output is five published assets. Your involvement: start recording, stop recording, review output, approve upload.