VidNo's script quality comes from one key architectural decision: instead of using a fine-tuned model or template system for narration, it sends structured context about your coding session to Claude and lets a frontier language model write the script. This article explains what that integration looks like under the hood.
What Gets Sent to Claude
VidNo does not send your screen recording, your source code, or your audio to Claude. It sends structured metadata:
- Scene summaries: A description of each scene (e.g., "Developer opened api/users.ts and added a new function")
- Git diffs: The actual code changes, scoped to the recording window
- OCR extracts: Text visible on screen, categorized by context (editor, terminal, browser)
- Frame classifications: What application was active in each scene
- Timing data: How long each scene lasted (used for pacing the script)
This is typically 10-50 KB of structured text per video. No binary data, no media files, no full source trees.
The Prompt Engineering
VidNo's prompt to Claude is multi-layered. The system prompt establishes the role:
You are writing the narration script for a developer
tutorial video. The video shows a screen recording of
a coding session. Your job is to explain what the
developer is doing, why they are doing it, and what
the viewer should learn from each step.
Rules:
- Reference specific function names, variable names,
and file paths from the diffs
- Explain the reasoning behind decisions, not just
the actions
- Use a conversational but technical tone
- Flag potential gotchas or alternative approaches
- Generate chapter titles for each major section
- Target the specified output length
The user message then includes the structured scene data, with each scene formatted as a block:
## Scene 3 (02:14 - 04:38)
Context: Code editor - api/users.ts
Action: Added fetchUserData function
Git diff:
+ export async function fetchUserData(page = 1) {
+ const res = await fetch(`/api/v2/users?page=${page}`);
+ if (!res.ok) throw new ApiError(res.status);
+ return res.json();
+ }
Terminal output after scene:
$ npm test
PASS src/api/users.test.ts
12 tests passed
Why Claude and Not a Fine-Tuned Model
VidNo chose Claude over a custom fine-tuned model for several reasons:
- Code understanding: Claude understands programming languages, frameworks, patterns, and conventions. It knows that
useEffectwith an empty dependency array runs once on mount. A fine-tuned narration model would not. - Reasoning about intent: When it sees a developer add error handling to a function, Claude can explain why -- not just describe the syntax. It understands defensive programming, edge cases, and API contract design.
- Natural language quality: Claude writes conversational prose that sounds like a developer explaining their work to a colleague. Fine-tuned models tend toward either robotic descriptions or generic fluff.
- Zero training data needed: VidNo does not need thousands of example scripts to produce good output. Claude's pre-training already covers the knowledge base.
API Cost Per Video
Claude API pricing is token-based. For a typical VidNo script generation:
- Input tokens: 3,000-15,000 (depending on recording length and code complexity)
- Output tokens: 1,500-5,000 (the generated script)
- Cost per video: approximately $0.10-0.30
This is the only recurring cost for VidNo Free tier users. Pro subscribers have API costs included in the monthly fee.
Script Output Format
Claude returns a structured JSON script that VidNo's rendering engine can process:
{
"chapters": [
{
"title": "Setting Up the API Route",
"timestamp_start": "00:00",
"narration": "We're starting with a fresh API route...",
"scenes": [1, 2, 3],
"emphasis_points": ["error handling", "pagination"]
}
],
"total_duration_estimate": "8:30",
"suggested_title": "Building a Paginated API with Error Handling",
"suggested_description": "..."
}
Customizing the AI Script
You can influence Claude's output through VidNo configuration:
# Set the target audience
vidno config set audience "intermediate developers"
# Adjust verbosity
vidno config set script-detail "high" # or "medium", "low"
# Add custom instructions
vidno config set script-notes "Always mention TypeScript types.
Avoid explaining basic JavaScript concepts."
These settings are appended to the Claude prompt, allowing you to fine-tune output style without modifying VidNo's core prompt engineering.
For editing the generated script before rendering, see the script editing guide. To learn about the full pipeline that wraps this API call, read how VidNo works.