NexaSDK: Run Multi‑Modal AI On‑Device with Day‑0 Models

January 16, 2026

Category: Practical Open Source Projects

Tags:

Open Source On-device AI nexa-sdk multimodal SDK

NexaSDK: Run Multi‑Modal AI On‑Device with Day‑0 Models

What is NexaSDK?

NexaSDK is a high‑performance, cross‑platform inference framework that lets developers run the most advanced large language models (LLMs), vision‑language models (VLMs), automatic speech recognition (ASR) systems, optical character recognition (OCR), and image‑generation models directly on the device—GPU, NPU, or CPU—without relying on cloud back‑ends. Built on the principles of minimal energy consumption and maximum speed, NexaSDK supports day‑0 model loading for a handful of the newest multimodal releases such as Qwen3‑VL, Gemini‑3n (Vision), DeepSeek‑OCR, and Granite‑4.0.

Why NexaSDK Stands Out

Feature	NexaSDK	Ollama	llama.cpp	LM Studio
NPU support	✅	❌	❌	❌
Cross‑platform (Android, iOS, Windows, macOS, Linux, IoT)	✅	⚠️	⚠️	❌
Day‑0 model support (GGUF, MLX, NEXA)	✅	❌	⚠️	❌
Full multimodality	✅	⚠️	⚠️	⚠️
One‑line deployment	✅	✅	⚠️	✅
OpenAI‑compatible APIs	✅	✅	✅	✅

The result: developer‑friendly, power‑efficient, and ready‑to‑go. Whether you’re building a quick prototype or a production‑level app, NexaSDK gives you the freedom to experiment with a wide range of models locally.

Supported Platforms & SDKs

Platform	Quick Start	SDK Language
Windows macOS Linux (Desktop)	CLI	Python / C++
Android	Android SDK	Kotlin/Java
iOS / macOS	iOS SDK	Swift
Linux / IoT (Docker)	Docker	None (CLI inside container)

Example: Running Qwen3-1.7B on the CLI

# Install
pip install nexaai

# Load model and chat
from nexaai import LLM, LlmChatMessage, GenerationConfig, ModelConfig

llm = LLM.from_(model="NexaAI/Qwen3-1.7B-GGUF", config=ModelConfig())
conversation = [LlmChatMessage(role="user", content="Tell me a joke!")]
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, GenerationConfig(max_tokens=150)):
    print(token, end="", flush=True)

The output arrives in real time, just like a cloud call, but all computation lives in the local device.

Day‑0 Model Support

Day‑0 means the model is ready to run immediately after download—no additional conversion or training required. NexaSDK supports thousands of community‑compiled GGUF weights as well as native NEXA and MLX formats. The SDK automatically detects the best inference engine for the hardware:

CPU/Intel‑Xe — default for desktop.
NPU — Qualcomm Hexagon, AMD NPU, Apple Neural Engine (ANE).
GPU — NVidia, AMD, Apple GPU.

This guarantees the fastest possible inference on the device at launch.

One‑Line Deployment on Android

Add to your build.gradle.kts:

implementation("ai.nexa:core:0.0.15")

NexaSdk.getInstance().init(this)
VlmWrapper.builder()
    .vlmCreateInput(
        VlmCreateInput(
            model_name = "omni-neural",
            model_path = "/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa",
            plugin_id = "npu",
            config = ModelConfig()
        )
    )
    .build()
    .onSuccess { vlm ->
        vlm.generateStreamFlow("Hello!", GenerationConfig()).collect { print(it) }
    }

With just a few lines of code, a complex VLM is running directly on Snapdragon hardware.

Community and Ecosystem

OpenAI‑compatible APIs: Switch between local and remote models effortlessly.
Extensible plugin architecture: Add custom hardware or new model formats.
Active GitHub repo (7.5k ⭐, 939 forks) with frequent releases, extensive documentation, and a robust test suite.
Partnerships with Qualcomm, IBM, Google, AMD, Nvidia, and Microsoft demonstrate proven industry support.

Licensing and Commercial Use

NexaSDK is dual‑licensed:

CPU/GPU components: Apache‑2.0.
NPU components: Free for personal use with a key from the Nexa AI Model Hub, commercial use requires a license negotiated with [email protected].

This guarantees that startups and enterprises can use the SDK without legal headaches.

Getting Started

Clone the repo: git clone https://github.com/NexaAI/nexa-sdk.
Install Docker or your native SDK.
Run nexa infer NexaAI/Qwen3-1.7B-GGUF to verify your environment.
Explore the cookbook and solutions directories for ready‑to‑run examples.

For detailed documentation, visit the official docs: https://docs.nexa.ai.

Final Thoughts

NexaSDK democratizes on‑device AI by offering a single, unified framework that removes the friction of converting models, managing dependencies, or tuning for specific hardware. Its day‑0 model support means you can experiment with the cutting‑edge multimodal landscape—no waiting for cloud callbacks or license approvals.

Whether you’re building a voice‑enabled assistant, a real‑time image classifier for a drone, or a cross‑platform note‑taking app, NexaSDK provides the speed, efficiency, and simplicity to keep your focus on the user experience.

Ready to take your AI workloads off‑cloud? Check out NexaSDK today and join a growing community of developers bringing multimodal intelligence straight to the edge.

Original Article: View Original

Share this article