AI Media Models | AIBit-Discover Open Source Projects

June 6, 2026

Lance: ByteDance's 3B Unified Model for Image and Video Understanding, Generation, and Editing

ByteDance's Lance is a 3B-parameter unified multimodal model that handles image/video understanding, generation, and editing with competitive benchmarks.

Mar 10, 2026

AI Mondo Poster Generator: One-Line Master Designs

Transform any idea into professional posters, book covers, and album art with a single sentence. Qiaomu Mondo Poster Design leverages 33+ legendary artist styles to auto-generate stunning visuals for WeChat, Xiaohongshu, Spotify, and more. No Photoshop skills needed – just describe your vision and get Mondo-style masterpieces in seconds. Supports custom ratios, style comparisons, and AI-enhanced prompts for perfect social media graphics.
Mar 9, 2026

Edit Banana: AI Converts Images to Editable DrawIO

Discover Edit Banana, the game-changing open-source tool that transforms static diagrams, flowcharts, and PDFs into fully editable DrawIO files and PPTX. Powered by fine-tuned SAM 3 segmentation and multimodal LLMs, it preserves layout, colors, text, and connections with stunning accuracy. Try the online demo instantly or run locally with Python. Perfect for engineers, researchers, and designers tired of recreating diagrams manually. Join 3.4k+ GitHub stars already using this productivity booster.
Mar 4, 2026

Jimeng AI Free API: Free Image/Video Generator

Discover Jimeng AI Free API - a powerful open-source service providing free access to Jimeng's advanced AI models for image and video generation. Supports 10+ models (4.5/4.1/3.0 Pro), OpenAI-compatible API, web dashboard with media library, and one-click Docker deployment. Get 66 free daily credits via token rotation, 2K image generation, smart aspect ratio detection, and automatic retry logic. Perfect for developers building AI applications without API costs.
Jan 31, 2026

Qwen3-ASR: Alibaba’s Open‑Source 52‑Language ASR Model

Alibaba Cloud’s latest release, Qwen3‑ASR, brings state‑of‑the‑art multilingual speech recognition to the open‑source community. Supporting 52 languages and 22 Chinese dialects, the two 1.7B/0.6B models excel on benchmarks and rival commercial APIs. The repo ships with a full inference toolkit that works with transformers or the high‑performance vLLM backend, automatic timestamping via the Qwen3‑ForcedAligner, and a ready‑to‑run Gradio demo. Whether you’re a researcher, developer, or hobbyist, this guide walks you through downloading, setting up, benchmarking, and deploying Qwen3‑ASR in Docker or directly on GPU, so you can start transcribing speech, music, and songs with ease. Key highlights: multilingual support, streaming inference, forced‑alignment, quick‑start scripts, Docker deployments, and API integration with OpenAI‑compatible endpoints.
Jan 25, 2026

HeartMuLa: Open-Source Music Generation Models 2026

Discover HeartMuLa – a family of open‑source music foundation models that generate high‑quality music from lyrics and tags. Learn how to install, run quick‑start demos, and customize the library with multi‑GPU support or lazy loading. Perfect for researchers, musicians, and developers eager to blend AI and creativity.
Jan 25, 2026

Qwen3‑TTS: Fast, Open‑Source Streaming TTS

Discover Alibaba’s Qwen3‑TTS, an open‑source, low‑latency speech synthesis framework that supports full‑language coverage, voice cloning, and design with natural‑language controls. This guide walks you through the models, architecture, quick‑start installation, and real‑world code examples. Whether you’re building chatbots, audiobooks, or multilingual voice assistants, Qwen3‑TTS offers a flexible, cloud‑friendly solution backed by Hugging Face and ModelScope. Dive into the repository, learn how to generate custom voices, clone speakers, and fine‑tune the system for your data. The article also highlights performance metrics, evaluation results, and practical deployment hints for both local and edge devices.
Jan 21, 2026

SongGeneration – LeVo Open‑Source Music Model (NeurIPS 2025)

Discover SongGeneration, the open‑source version of LeVo, a state‑of‑the‑art neural music generator that can produce full‑length songs with vocals and accompaniment in seconds. With multiple pretrained checkpoints, a Gradio UI, Docker support, and comprehensive installation guides, developers and hobbyists can dive straight into generating high‑fidelity tracks or experiment with multilingual lyrics. This article walks you through the repository’s structure, key features, how to set up the environment, run inference, and use the handy prompts and lyrics formatting rules. Whether you’re building a music app or just curious about AI‑driven composition, SongGeneration offers a ready‑to‑use platform that’s as powerful as it is accessible.
Jan 19, 2026

Pocket‑TTS: Lightweight CPU‑Only Text‑to‑Speech Library

Discover Pocket‑TTS, an ultra‑compact, CPU‑friendly TTS solution that eliminates GPU dependencies and web API calls. Learn how to install it with a single pip or uv command, clone voices from wav files, serve a local HTTP server for instant audio streaming, and integrate it into Python projects or Colab notebooks. With 100M‑parameter models running on 2 cores, Pocket‑TTS delivers ~200 ms latency and 6× real‑time speed on modern CPUs. This guide covers setup, voice management, CLI usage, and best practices, making it ideal for developers and hobbyists looking to embed TTS in small devices or edge environments.
Jan 16, 2026

Sopro – Lightweight Text‑to‑Speech with Zero‑Shot Voice Cloning

Discover Sopro, the lightweight English TTS model built on WaveNet‑style dilated convolutions. With only 169 M parameters, it delivers fast, streaming synthesis and zero‑shot voice cloning from just a few seconds of audio. Learn how to install, run from the CLI, or embed it in Python, and explore the demo web UI. Perfect for developers who want fast, flexible TTS without the heavy Transformer overhead.