Video Slideshow Generator

Generate video slideshows from text with AI-generated images and voice narration. Each sentence becomes a slide with its own image and audio.

⚠️ Requirements (click to expand)

This tool requires:

WebGPU: Required for image generation. Chrome 113+, Edge 113+, or Safari 18+
Models: Image generation and TTS models will be downloaded on first use
Processing Time: Video generation can take several minutes depending on text length
Memory: Large videos may require significant browser memory

Input Text

Text Splitting Mode

Image Generation Model

Image Style (Optional) Style instructions will be added to each image prompt

Voice

Video Resolution

Generated Video

Generated video will appear here...

FAQ

How does it work?

The tool splits your text into segments (sentences or paragraphs), generates an AI image for each segment, creates voice narration using text-to-speech, and combines everything into a video slideshow using FFmpeg.

How long does it take?

Generation time depends on text length. Each slide requires generating an image (~10-30 seconds) and audio (~5-10 seconds), plus video assembly. A 5-sentence video typically takes 2-5 minutes.

Can I use this offline?

Once models are downloaded and cached, you can use the tool offline. Models are cached in your browser's IndexedDB.

Is my data sent to a server?

No. All processing happens entirely in your browser. Images, audio, and video are generated locally.