Image Caption Generator
Generate AI-powered captions for images using the vit-gpt2-image-captioning model. Runs entirely in your browser with Transformers.js.
âšī¸ How it works (click to expand)
This tool:
- Uses Transformers.js: Runs the model entirely in your browser
- Model: Xenova/vit-gpt2-image-captioning (Vision Encoder-Decoder)
- First Load: Model will be downloaded on first use (~200MB)
- Caching: Model is cached in your browser for faster subsequent use
- Privacy: All processing happens locally - no data is sent to servers
Click to upload or drag and drop
Supports JPG, PNG, GIF, WebP
Upload an image and click "Generate Caption" to get a description
FAQ
How accurate are the captions?
The model provides reasonable captions for most images, but accuracy depends on image complexity and content. It works best with clear, well-lit images containing recognizable objects and scenes.
What image formats are supported?
JPG, PNG, GIF, and WebP formats are supported. The image will be automatically processed by the model.
How long does it take?
First-time model loading takes 30-60 seconds to download (~200MB). Subsequent uses are much faster (5-15 seconds) as the model is cached in your browser.
Is my data sent to a server?
No. All processing happens entirely in your browser. The model is downloaded once and cached locally.
Can I use this offline?
Once the model is downloaded and cached, you can use the tool offline. The model is stored in your browser's IndexedDB.