Nico Digitalsby Nico Bernaola
← Todos los artículos

How I Built an Open-Source Python Engine to Fully Automate Short-Form Video Production

Build a local, open-source video automation pipeline with Python, faster-whisper & auto-editor. Nico Digitals Engine: the $0 tool for automated captions.

·Nicolas Bernaola

Let’s be honest: the absolute worst part of making short-form videos (TikToks, Reels, or Shorts) isn’t the creative work. It’s the endless, mindless friction.

Spending hours manually slicing out awkward pauses, waiting for a slow cloud platform to upload your footage, or burning your hard-earned cash on monthly AI subscriptions with strict credit limits is just bad business.

To fix this bottleneck once and for all, I applied pure systems thinking and engineered a solution: Nico Digitals Engine. It’s a local, open-source automation pipeline that transforms raw video footage into a publish-ready content package with a single terminal command.

No hidden fees, no cloud queues, and absolutely zero subscription costs. It runs 100% on your local machine for free(And you don't need a NASA computer).

The Core Concept: How It Works

A lot of automation tools are bloated and fragile. If one piece breaks, the whole thing crashes. To avoid that, this engine uses a super clean workflow built around a single data contract: a word-level JSON transcript with exact timestamps.

Think of it like a production line with three simple stages:

  1. The Input: You drop a raw video file (or a whole folder of videos) into the engine.
  2. The Throughput: The engine instantly extracts the audio, analyzes the frequencies to slice out dead air, and runs a high-speed local AI model to transcribe every single word.
  3. The Output: The engine spits out an organized folder containing your cleaned video file, custom-timed subtitles, and platform-specific social media copy.

Our transcription model, has a higher fidelity than Capcut Pro's auto-captions engine, and it's free.

Inside the Open-Source Tech Stack

Instead of wasting money on heavy cloud servers, I built this engine using lightweight, top-tier tools specifically optimized for speed and local performance:

  • faster-whisper: This handles the transcription. It’s an ultra-fast version of OpenAI's Whisper that runs directly on your computer's hardware. Total cost? Zero.
  • auto-editor: The ultimate tool for speed-editing. It analyzes audio tracks natively to detect silence and automatically chops it out, giving you that fast-paced, high-retention rhythm.
  • Gemini 2.5 Flash: Using the free-tier API, the engine feeds the transcript to Gemini, which instantly writes hooks and SEO-friendly descriptions for Instagram, TikTok, YouTube, and LinkedIn.
  • Questionary & FFmpeg: These power the user experience. You get an interactive command-line menu where you can literally drag and drop your folders and select your settings with your arrow keys.

We agree AI is great, but you need an architectural approach to make it part of the workflow. AI is not the workflow, just a gear with great leverage inside your system. Those who want to swap the whole system for AI, will get nothing but Slop, Lack of Control and a Massive API bill each month.

Why We Do NOT "Burn" Subtitles (A Core Design Decision)

Many automated tools try to hardcode ("burn") subtitles directly onto the actual video file (Yes, you, submagic). Nico Digitals Engine rejects this on purpose. As a rule of thumb: The engine should process the data; the video editor should design the content.

Forcing your computer to render subtitles onto a video file locally takes forever and ruins your editing flexibility. If the AI makes a typo, or if a caption accidentally covers a speaker's face, you have to re-render the entire video from scratch. That's a massive waste of time.

Instead, the engine exports a lightweight, highly optimized .srt subtitle file based on industry-standard retention presets:

  • MrBeast Style: 1 word per frame for hyper-aggressive pacing.
  • Hormozi Style: 3 words per frame for an organic, highly engaging reading rhythm.
  • Podcast Style: 5 words per frame for a natural, conversational flow.
  • Full Custom: You choose the amount of words you need.

The ideal workflow: You drag the clean, silence-cut .mp4 and the custom .srt file straight into Premiere Pro or CapCut. You apply your custom brand fonts and colors in exactly one click. The entire process takes 30 seconds.

Scale Massively with Batch Processing

The engine is built to scale. If you shoot 10 or 15 raw videos in one session, you don’t have to process them one by one. You can drag the entire folder into the engine, pick your retention presets, and walk away.

While you go grab a cup of coffee, the system quietly processes everything in the background and organizes your outputs into clean, platform-ready folders.

Building this project proved that you don't need complex cloud infrastructure or expensive monthly software bills to solve massive operational bottlenecks. Good data structure, clean Python logic, and local tools are all it takes to build a high-performance media engine.

The beta is officially live and completely free under the Apache 2.0 License.

👉 Check out the source code, clone the repository, and run it yourself: GitHub - Nico-Bernaola/Nico-Digitals-Engine

← Volver al blog