RAGbits: Rapid Development for GenAI Applications

June 09, 2025

Practical Open Source Projects

Open Source AI GenAI Framework RAG LLM Development Python AI

RAGbits: The Toolkit for Rapid GenAI Application Development

In the rapidly evolving landscape of Generative AI, developers constantly seek robust and efficient tools to bring their innovative applications to life. Enter RAGbits, an open-source framework by deepsense.ai, purpose-built to accelerate the creation of reliable and scalable GenAI solutions, particularly those leveraging Retrieval-Augmented Generation (RAG).

What is RAGbits?

RAGbits is a comprehensive set of building blocks designed to streamline the entire GenAI application development lifecycle. It offers a modular and flexible architecture, allowing developers to integrate only the components they need, thereby reducing dependencies and optimizing performance. The framework is heavily focused on practical application, providing robust features for managing Large Language Models (LLMs), handling diverse data types, and deploying sophisticated RAG pipelines.

Key Features of RAGbits:

RAGbits stands out with its powerful feature set, empowering developers to build sophisticated AI applications with ease:

🔨 Build Reliable & Scalable GenAI Apps

Flexible LLM Integration: Seamlessly swap between over 100 LLMs via LiteLLM or integrate local models, offering unparalleled flexibility.
Type-Safe LLM Calls: Utilize Python generics to enforce strict type safety during model interactions, ensuring robustness and reducing errors.
Bring Your Own Vector Store: Connect with popular vector stores like Qdrant, PgVector, and more, or easily integrate custom solutions.
Developer Tools Included: Access a suite of command-line tools for managing vector stores, configuring query pipelines, and testing prompts directly from your terminal.
Modular Installation: Install only the necessary components, tailoring the framework to your specific project needs and improving efficiency.

📚 Fast & Flexible RAG Processing

Extensive Data Ingestion: Process over 20 data formats, including PDFs, HTML, spreadsheets, and presentations. Leverage powerful parsers like Docling and Unstructured, or implement custom parsers.
Complex Data Handling: Extract structured content, tables, and images with built-in Visual Language Model (VLM) support.
Any Data Source Connectivity: Use pre-built connectors for cloud storage services like S3, GCS, and Azure, or develop your own connectors.
Scalable Ingestion: Process large datasets efficiently using Ray-based parallel processing for rapid data onboarding.

🚀 Deploy & Monitor with Confidence

Real-time Observability: Track application performance and gain insights using OpenTelemetry and comprehensive CLI analytics.
Built-in Testing: Validate and refine your prompts with integrated promptfoo testing before deploying your applications.
Auto-Optimization: Continuously evaluate and optimize model performance through systematic processes.
Chat UI: Deploy a ready-to-use chatbot interface complete with API, data persistence, and user feedback mechanisms.

Getting Started with RAGbits

Installation is straightforward. You can get started quickly with a simple pip command:

pip install ragbits

This command installs a starter bundle, including ragbits-core (fundamental tools), ragbits-agents (for agentic systems), ragbits-document-search (retrieval and ingestion), ragbits-evaluate (unified evaluation), ragbits-chat (conversational AI), and ragbits-cli (command-line interface). Alternatively, individual components can be installed as needed.

Practical Examples:

The RAGbits documentation provides clear quickstart guides, demonstrating common use cases. Here's a glimpse into its simplicity:

Defining and Running LLM Prompts: Easily define type-safe prompts and generate responses from your chosen LLM.

# Example of LLM prompt generation
import asyncio
from pydantic import BaseModel
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt

class QuestionAnswerPromptInput(BaseModel):
  question: str

class QuestionAnswerPromptOutput(BaseModel):
  answer: str

class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, QuestionAnswerPromptOutput]):
  system_prompt = """
You are a question answering agent. Answer the question to the best of your ability.
"""
  user_prompt = """
Question: {{ question }}
"""

llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)

async def main() -> None:
  prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput(question="What are high memory and low memory on linux?"))
  response = await llm.generate(prompt)
  print(response.answer)

if __name__ == "__main__":
  asyncio.run(main())

Building a Vector Store Index: Ingest documents and query your custom knowledge base.

# Example of document search
import asyncio
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch

embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)

async def run() -> None:
  await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
  result = await document_search.search("What are the key findings presented in this paper?")
  print(result)

if __name__ == "__main__":
  asyncio.run(run())

Constructing a RAG Pipeline: Combine LLMs with retrieved context for accurate and relevant responses. ```python # Example of a RAG pipeline import asyncio from pydantic import BaseModel from ragbits.core.embeddings import LiteLLMEmbedder from ragbits.core.llms import LiteLLM from ragbits.core.prompt import Prompt from ragbits.core.vector_stores import InMemoryVectorStore from ragbits.document_search import DocumentSearch

class QuestionAnswerPromptInput(BaseModel): question: str context: list[str]

class QuestionAnswerPromptOutput(BaseModel): answer: str

class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, QuestionAnswerPromptOutput]): system_prompt = """ You are a question answering agent. Answer the question that will be provided using context. If in the given context there is not enough information refuse to answer. """ user_prompt = """ Question: {{ question }} Context: {% for item in context %} {{ item }} {%- endfor %} """

embedder = LiteLLMEmbedder(model_name="text-embedding-3-small") vector_store = InMemoryVectorStore(embedder=embedder) document_search = DocumentSearch(vector_store=vector_store) llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)

async def run() -> None: question = "What are the key findings presented in this paper?"

await document_search.ingest("web://https://arxiv.org/pdf/1706.03762") result = await document_search.search(question)

prompt = QuestionAnswerPrompt( QuestionAnswerPromptInput( question=question

Original Article: View Original