AI Agents for Video Content — From Footage to Published
Back to Blog
Video Production2026-03-08· 8 min read

AI Agents for Video Content — From Footage to Published

I shoot for 20 minutes. My AI agents handle the rest: editing, subtitles, thumbnails, titles, descriptions, chapters, and publishing. Here's the complete system.

#video production#AI agents#content automation#YouTube#video editing

I pushed 47 YouTube videos live last year.

Hours spent editing: zero.

Thumbnails? Zero.

Descriptions and chapters? Zero.

My job: talk to camera for 15-20 minutes, hit stop.

The full workflow—editing, color, audio, subtitles, thumbnails, titles, chapters, publishing—runs itself while I sleep. AI handles everything.

Here’s exactly how I built it.

How Video Production Used to Suck

Old workflow:

  • Shoot 30-45 min (multiple takes, setup hell)
  • Transfer footage (10 min, SD card roulette)
  • Edit 2-4 hours (cuts, color, audio, transitions)
  • Thumbnail: 30-60 minutes staring at my own face
  • Title/description/tags: 45 minutes fighting the YouTube beast
  • Upload/schedule: 10 min

Net: 4-7 hours per video. Weekly was exhausting; daily was a joke.

Now? 20 minutes of recording, then go ride.

The Actual System

Let’s get tactical—here’s the tech stack.

1. Footage Ingestion

Tools:

  • Camera: Sony A7IV with WiFi
  • Mic: Rode Wireless Go II
  • Lighting: Standard 3-point

When I stop recording, the camera shoves the footage straight to Google Drive via WiFi. No cable, no SD card drama.

Zapier sits on the Drive folder, watching for files. New file? Workflow fires.

2. AI Video Editing

Here's where the heavy work shifts to silicon.

I lean on Descript (API access, $40/mo). You could wire up Premiere via API, but Descript is dead simple.

My script:

  • Auto-transcribe the audio (99% accurate thanks to GPT-4.1 or Claude Sonnet, depending on file length)
  • Detect and nuke filler ("um", "uh", "like", all of them)
  • Chop pauses >2s, cut false starts
  • Audio gets background noise cleaned, volume leveled, YouTube-legal compression
  • Auto-framing (keeps me in the middle even if I scoot)
  • Color correction for consistency, slap basic transition on cut scenes

Descript spits out a clean video: no weird pauses, crisp color and sound.

Time saved: 2-3 hours per video.

3. AI Subtitles

Accessibility and SEO boost. Descript outputs a transcript for free.

Workflow:

  • Slice it into subtitle chunks timed for readability
  • White sans-serif, black outline, always in safe zone
  • Multiple speakers? Highlight names

Burned-in for clips (Reels, TikTok), SRT for YouTube.

Sometimes swap in Rev.ai’s API ($1.50/hr) for longer stuff. But mostly Descript.

4. AI Thumbnails

Key frames get pulled by code as soon as the edit is done. Typical function looks like:

from vision_social_poster import extract_frames, select_best_frame, enhance, add_text_overlay

def build_thumbnail(videopath, transcript):
    frames = extract_frames(videopath, interval=30)
    frame = select_best_frame(frames, ["face clear", "lighting good", "interesting expression", "rule of thirds"])
    enhanced = enhance(frame, sat=20, contrast=15, sharpen=True)
    title = "How I Automated My YouTube"  # Or generate_title(transcript)
    return add_text_overlay(enhanced, title, style="bold_yellow")

Most times, I let AI pick the best shot and overlay title. Manual thumbs just to veto anything janky.

I also mess with hybrid systems—sometimes grab templates from Canva API, sometimes spin up Dall-E for a synthetic background, then assemble in code.

Saves 30-45 min per video, plus zero time spent scrolling for a “good” frame.

5. AI Metadata (Titles, Description, Tags)

Feed transcript into Azure GPT-4.1 (or local Ollama when I’m offline). Prompts like:

“Here’s the transcript. Generate 10 titles, 60-70 chars, max click potential, primary keyword, no hype, real curiosity gap.”

Same for descriptions:

  • First 2 lines are the hook
  • Timestamps, summary, resources, hashtags

Tags: straight from keywords in content (OpenAI or Claude picks the best n-grams and adds brand tags for @billy_kennedy_bmx, @nepa_ai, etc).

6. AI-Generated Chapters

Simple script:

  • Claude Sonnet (or GPT) parses the transcript
  • Breaks at topic changes (“Now let’s talk about...”) and natural pauses
  • Timestamps & concise, keyword-rich chapter titles
  • Jammed into the YouTube description

7. Auto-Publish

Youtube API handles the upload:

def publish(video, metadata):
    upload_video(video, metadata)
    add_to_playlist(video["id"], "AI Automation Series")
    schedule_publish(video["id"], when="Tomorrow 9AM EST")
    notify("Ready for review: " + metadata["title"])

I get a preview link on my phone. 95% of the time, it’s good to go. If not, tweak and re-run. When I approve, it drops as scheduled.

8. Short-Form Clipping (Reels/TikTok/Shorts/X)

I run [vision_social_poster.py] and social_poster.py for slicing:

  • Picks 30-60s segments with strongest hooks/explains
  • Reframes 9:16 (vertical)
  • Captions always on
  • Auto-adds trending audio if needed
  • Schedules to Instagram/TikTok via automation (or uses Repurpose.io as backup)

Each long vid throws off 8-12 short clips, all posted on a schedule.

Real Workflow, Real Times

  • Monday 10:00 AM: Shoot for 20 min
  • 10:25: Uploads via camera WiFi
  • 10:30: Edit pipeline triggers
  • 11:15: Video, meta, and thumbs ready
  • 11:20: Uploads as private
  • 11:30: Phone notification links to preview
  • 11:45: Approve
  • 12:00: Scheduled for next morning
  • 1:00 PM: Short clips queued up

Total: 35 minutes hands-on. Old-school: 4-7 hrs.

Actual Results

2024 numbers, no fluff:

  • Pre-automation: 12 videos/year, 5 hours each, 60 hours/year, always behind, quality all over the place
  • Now: 47 videos, 35 minutes each, 27.5 hours total, consistent weekly drops, always at least “good” quality

Subs up 340%, watch time up 5x, revenue up 4x.

Spent ~$1,140 in tools, saved myself ~32 hours/year. Using $100/hr math, that’s $3k in recovered time—actual net gain.

Stack & Services

What I use:

  • Descript (edit/transcribe, $40)
  • Zapier (workflow glue, $49)
  • OpenAI/Claude/Azure GPT-4.1 (metadata, $20-30)
  • YouTube API (publishing, free)
  • Brand_cron.py / social_poster.py + custom scripts for IG/TikTok/Shorts
  • Repurpose.io ($25) if I want a SaaS backup

Extras: Canva API (for templates, $13), TubeBuddy ($9) for SEO.

$95-165/month total for all the air cover.

What I DON’T Automate

  • Content topics—what I say and teach
  • Actual on-camera performance
  • My last review before publishing
  • Talking to my community
  • The “me” part—personality, voice, jokes, screwups

Everything else? Gone.

More Advanced Stuff

  • Dynamic editing that adapts (tests fast cuts vs. long takes, tunes audio style, etc) based on channel analytics, routed back to the edit AI via omni-bridge
  • A/B tests on thumbnails; tracks what works and updates generation logic
  • AI voice enhancements (fix flubbed lines, intro/outros, language dubs)
  • Social + email auto-promotion via Playwright + CDP, pushing natively into Threads, X, Instagram

The stack keeps learning—use stats from published videos to tune up everything downstream.

Getting Started Fast

Week 1: Sign up for Descript, run a test video, get familiar
Week 2: Set up OpenAI or Claude; automate titles/descriptions
Week 3: Hook up YouTube API, auto-upload private tests
Week 4: Record a fresh video, full E2E run, refine

Month 2+: Add short-form, tweak the workflow, bolt on custom features.

FAQ

“Will AI kill my vibe?”
Nope. You’re the content; AI is just the wrench and the pit crew.

“What if I want creative edits?”
Set your style rules. AI edits your way, all day. Rushed and sloppy? That was human you, not the AI.

“Is it overpriced?”
Price it out: a human editor costs $500+ per video. This is a rounding error.

“Will viewers smell the robots?”
Not one comment about “AI” on my channel. All they see is consistency and quality.

The Future: Even Less Work

Next up:

  • Describe a video, AI generates the visuals—a real “make the YouTube for me” button
  • Synthetic presenters (never get tired, never get sick)
  • B-roll, multi-language dubs, fully personalized clips, all handled
  • The line where “automation” stops just keeps moving

But even now, this system is game-changing.

Don’t Wait

Don’t get caught chasing every shiny tool. Pick one pain point (editing? metadata? thumbnails?) and automate that first. Stack the rest as you go.

A month from now, you’ll be shipping weekly with half the effort and none of the burnout.

I record. AI does the rest. You can, too.

Build your stack this weekend—and reclaim your time.

See what you can automate at axon.nepa-ai.com

Logitech C920x HD Pro Webcam →

Elgato 4K60 Pro MK.2 Capture Card →