VidNo runs locally on your machine. That means your hardware directly affects processing speed and output quality. This guide covers the minimum specs, recommended specs, and how to calculate storage needs for your workflow.
GPU Requirements
The GPU is the most critical component. VidNo uses CUDA for voice synthesis and video rendering acceleration. AMD GPUs are not currently supported for voice synthesis (FFmpeg encoding works on any GPU).
| GPU | VRAM | Voice Synthesis Speed | Verdict |
|---|---|---|---|
| RTX 3060 | 12 GB | ~4 min per 10-min script | Minimum viable |
| RTX 3080 | 10 GB | ~2.5 min per 10-min script | Good |
| RTX 3090 | 24 GB | ~1.8 min per 10-min script | Great |
| RTX 4070 Ti | 12 GB | ~2 min per 10-min script | Good |
| RTX 4090 | 24 GB | ~1.2 min per 10-min script | Optimal |
| RTX 5090 | 32 GB | ~0.8 min per 10-min script | Overkill (but fast) |
The 12 GB VRAM floor matters because the voice synthesis model requires approximately 8 GB of VRAM. Cards with 8 GB VRAM (like the RTX 4060 or RTX 3070) can technically run VidNo, but voice quality degrades due to model quantization.
For detailed GPU benchmarks and purchase recommendations, see the NVIDIA GPU guide.
CPU Requirements
The CPU handles OCR, frame extraction, and FFmpeg encoding. Any modern multi-core processor works fine:
- Minimum: 4-core / 8-thread (Intel i5-10400 or AMD Ryzen 5 3600)
- Recommended: 8-core / 16-thread (Intel i7-12700 or AMD Ryzen 7 5800X)
- Optimal for batch processing: 12+ cores (Intel i9-13900 or AMD Ryzen 9 7950X)
CPU bottlenecks usually appear during batch processing when multiple recordings are being analyzed simultaneously. For single-video workflows, any modern quad-core is sufficient.
RAM Requirements
- Minimum: 16 GB -- enough for single-video processing
- Recommended: 32 GB -- comfortable headroom for longer recordings
- Batch processing: 64 GB -- if you plan to process multiple recordings in parallel
RAM usage scales with recording length. A 60-minute recording at 1080p generates roughly 6 GB of frame data during the analysis stage. This data is streamed and discarded, but peak usage can spike.
Storage Requirements
VidNo needs storage for three things:
- Installation + models: ~10 GB (one-time)
- Input recordings: varies by your recording settings
- Output videos + intermediate files: roughly 2-3x your input file size
A typical workflow looks like this:
# Example: 30-minute recording at 1080p/30fps
Input recording: ~1.2 GB
Intermediate files: ~2.8 GB (deleted after rendering)
Output (3 formats): ~1.6 GB total
Net storage needed: ~5.6 GB per session
Use an SSD. Mechanical drives create a bottleneck during frame extraction and video rendering. NVMe is ideal but SATA SSD works fine.
Operating System
- Linux (recommended): Ubuntu 20.04+, Fedora 36+, Arch Linux. Native CUDA support, best FFmpeg performance.
- Windows: Windows 10/11 with WSL2. VidNo runs inside WSL. Native Windows support is planned but not yet available.
- macOS: Not currently supported. Apple Silicon lacks CUDA, and the Metal backend for voice synthesis is in development.
Software Dependencies
# Required
node --version # v18.0.0 or higher
ffmpeg -version # v5.0 or higher
nvidia-smi # NVIDIA driver 525+ with CUDA 12+
# Optional (improves results)
git --version # Any recent version (for diff analysis)
tesseract --version # v5+ (fallback OCR, VidNo has built-in OCR)
Quick System Check
VidNo includes a built-in system check command:
vidno doctor
# Output:
# GPU: NVIDIA RTX 4090 (24 GB VRAM) .......... OK
# CUDA: 12.3 ................................. OK
# Node.js: v20.11.0 .......................... OK
# FFmpeg: 6.1.1 .............................. OK
# RAM: 32 GB ................................. OK
# Disk: 180 GB free .......................... OK
# Voice model: loaded ........................ OK
Run vidno doctor before your first session to verify everything is configured correctly. If anything fails, the output includes specific fix instructions.
Ready to install? Follow the getting started guide.