The script makes or breaks a tutorial video. A well-written script that accurately explains what the code does and why each decision was made produces a video that teaches something. A vague script that says "now we add some code to handle the request" produces a video that wastes the viewer's time. Most AI script generators produce the vague version because they do not understand the content they are writing about.
The Git Diff Approach
The most effective AI script generation for developer content does not start with "describe this video." It starts with structured data about what actually happened during the recording. The two richest data sources are OCR output (what appeared on screen) and git diffs (what changed in the codebase).
A git diff for a coding session might look like this:
diff --git a/src/middleware/rateLimit.ts b/src/middleware/rateLimit.ts
new file mode 100644
--- /dev/null
+++ b/src/middleware/rateLimit.ts
@@ -0,0 +1,42 @@
+import { Redis } from 'ioredis';
+
+interface RateLimitConfig {
+ windowMs: number;
+ maxRequests: number;
+}
+
+export function createRateLimiter(redis: Redis, config: RateLimitConfig) {
+ return async (req: Request, res: Response, next: NextFunction) => {
+ const key = `ratelimit:${req.ip}`;
+ const current = await redis.incr(key);
+
+ if (current === 1) {
+ await redis.expire(key, config.windowMs / 1000);
+ }
+
+ if (current > config.maxRequests) {
+ return res.status(429).json({ error: 'Rate limit exceeded' });
+ }
+
+ res.setHeader('X-RateLimit-Remaining', config.maxRequests - current);
+ next();
+ };
+}
From this diff alone, an AI script writer can produce accurate narration:
"We start by creating a rate limiting middleware using Redis as the backing store. The function takes a Redis client and a configuration object specifying the time window and maximum request count. For each incoming request, we increment a counter keyed by the client's IP address. On the first request in a window, we set the expiry to match our window duration. If the counter exceeds the maximum, we return a 429 status. Otherwise, we set a header showing remaining requests and pass control to the next middleware."
That narration is technically accurate, explains the "why" behind each line, and uses correct terminology. No human writer needed. The AI understood the code because it had the actual code as input, not a vague description.
VidNo's Script Generation Pipeline
The script generation in VidNo follows a specific chain:
- Collect data: OCR text from every frame, git diff from the working directory, audio transcription if present
- Build timeline: Map each code change to its timestamp in the recording
- Identify segments: Group changes into logical segments (setup, implementation, testing, debugging, completion)
- Generate per-segment narration: Send each segment's data to Claude API with context about the overall project
- Assemble script: Combine segment narrations with transitions, adjust pacing, add chapter markers
- Review pass: Send the complete script back to Claude for consistency checking (does segment 3 reference something introduced in segment 5? Fix the order)
Common Script Quality Issues
Even with git diff input, AI scripts can go wrong. The most common issues:
Over-explanation. The AI describes every line change, including import statements and variable declarations that do not need narration. Fix: instruct the model to skip boilerplate and focus on logic.
Incorrect causation. The AI says "we add error handling because the previous version could crash" when actually the error handling was always planned, not a response to a bug. Fix: do not infer motivation beyond what the code shows.
Wrong audience level. The AI explains what a function is to an audience that already knows TypeScript. Fix: specify the target audience's skill level in the prompt ("experienced developers learning a new library, not beginners learning to code").
The Quality Ceiling
AI-generated scripts from git diffs are more consistently accurate than human-written scripts. Humans forget what they changed, misremember the order of operations, and skip details they think are obvious. The AI sees every diff line and describes each one. The tradeoff is that AI scripts are less engaging -- they lack anecdotes, humor, and the personal asides that make a human narrator compelling. For tutorial content where accuracy matters more than entertainment, this is an acceptable tradeoff.