AIBit-Discover Open Source Projects AIBit-Discover Open Source Projects
Open Source ProjectsWeb Scraping & DataAI Agents & AutomationAI Tools & Resources
More
Learning & TutorialsAI Research & BenchmarksDevelopment & SecurityWeb & InfrastructureMedia & Content CreationHardware & Edge AIStartup Resources
AIBit-Discover Open Source Projects › AI Research & Benchmarks› RAG & Data Research

April 9, 2026

Zvec: Lightning-Fast In-Process Vector DB from Alibaba

Discover Zvec, Alibaba's open-source vector database that embeds directly into your apps with zero server setup. Search billions of vectors in milliseconds, support dense/sparse embeddings, hybrid search, and run anywhere—from notebooks to edge devices. Latest v0.3.0 adds Windows/Android support, RabitQ quantization, and C-API for AI agents. Install via pip or npm and start building RAG apps today with this production-grade, lightweight powerhouse boasting 9.3k GitHub stars.

  • Apr 8, 2026

    txtai: All-in-One AI Framework for RAG & Agents

    Discover txtai, the ultimate open-source AI framework combining semantic search, LLM orchestration, autonomous agents, and RAG pipelines. Build production-ready AI apps with vector search, multimodal embeddings, and workflow automation. Get started in minutes with pip install txtai and explore 70+ Colab notebooks covering everything from semantic graphs to speech-to-speech RAG.

  • Apr 3, 2026

    SentrySearch: Semantic Video Search with AI

    Discover SentrySearch, the open-source tool that transforms hours of video footage into searchable clips using Google's Gemini Embedding API or local Qwen3-VL models. Just type 'red truck running a stop sign' and get perfectly trimmed video clips back. Perfect for Tesla dashcam analysis, security footage, or any MP4/MOV files. Features local GPU acceleration, Tesla metadata overlays, and automatic still-frame skipping to save costs and time.

  • Mar 29, 2026

    TurboQuant+: 6.4x KV Cache Compression for LLMs

    TurboQuant+ implements ICLR 2026's breakthrough KV cache compression, achieving 4.6-6.4x compression with near q8_0 quality and speed. Features turbo2/turbo3/turbo4 formats, attention-gated Sparse V decoding (+22.8% decode speed), and full llama.cpp Metal integration. Run Qwen 3.5 35B-A3B on M5 Max with 93.9% NIAH retrieval and 1.02x q8_0 prefill speed. Complete Python prototype with 511+ tests and community validation across Apple Silicon, NVIDIA, and AMD.

  • Mar 3, 2026

    br/acc: Brazil's Open Graph for Civic Intelligence

    Discover br/acc, the open-source graph infrastructure that unifies Brazil's scattered public databases into a single queryable Neo4j graph. From company registries and procurement data to health records and environmental sanctions, this decentralized project makes government data actionable for civic improvement. Features 45+ ETL pipelines, React frontend, FastAPI backend, and one-command Docker bootstrap. LGPD-compliant and privacy-first, it's ready for local development with make bootstrap-demo.

  • Feb 20, 2026

    Dash: Self‑Learning Data Agent with 6 Layers of Context

    Discover Dash, an open‑source self‑learning data agent that grounds its answers in six layers of context. Learn how to set it up locally or on Railway, how the agent uses hybrid search to generate correct SQL, and how it continuously improves without retraining. The article walks through installation, data loading, knowledge organization, and real‑world query examples—offering a practical guide for developers building AI‑powered data tools.

  • Feb 12, 2026

    World Monitor: Open‑Source AI‑Powered Global Intelligence Dashboard

    World Monitor is a free, open‑source platform that unifies real‑time news, satellite imagery, military flight data, and market feeds into a single interactive map. Leveraging LLMs for summarization, hybrid threat classification, and anomaly detection, it delivers actionable situational awareness for governments, researchers, and journalists. The dashboard is built with TypeScript, Vite, and deck.gl, and can be self‑hosted or run on the web. Read on to discover how the system aggregates 100+ data sources, uses edge‑functions for caching and security, and how you can contribute or deploy your own instance.

  • Feb 6, 2026

    Web Search MCP Server: Local LLM Web Search Without API Keys

    Looking to give your locally hosted LLMs a powerful, on‑premise web‑search capability? The Web Search MCP Server offers a TypeScript‑based, browser‑driven solution that pulls real‑time content from Bing, Brave, and DuckDuckGo. It provides three dedicated tools—full-web-search, get-web-search-summaries, and get-single-web-page-content—so you can choose between deep content extraction or quick snippets. This article walks you through installation, configuration, environment variables, and real‑world examples, plus troubleshooting tips and performance tricks, so you can integrate fast, reliable web search into any local LLM workflow.

  • Jun 11, 2025

    Common Crawl: Free & Open Web Data for Everyone

    Discover Common Crawl, a non-profit organization offering a massive, free, and open repository of web crawl data. Since 2007, Common Crawl has accumulated over 250 billion pages, with 3-5 billion new pages added monthly, making it an invaluable resource for researchers, developers, and data scientists. Learn how this extensive dataset has been cited in over 10,000 research papers and continues to support advancements in AI, language models, and web analysis. Explore their latest web graphs and understand the impact of this foundational open-source project.

  • Jun 10, 2025

    Master Advanced RAG Techniques: A GitHub Repository

    Dive into the world of Retrieval-Augmented Generation (RAG) with a comprehensive GitHub repository featuring advanced techniques. This resource provides practical implementations and tutorials covering foundational RAG, query enhancement, context enrichment, and advanced retrieval methods. Perfect for developers and researchers looking to elevate their RAG systems, it includes runnable scripts, detailed explanations, and integration examples with popular frameworks like LangChain and LlamaIndex. Explore cutting-edge approaches like Graph RAG, Self-RAG, and Corrective RAG, along with evaluation methodologies to fine-tune your AI applications. Join a vibrant community and contribute to this evolving knowledge hub for RAG innovation.

  • Jun 9, 2025

    RAGbits: Rapid Development for GenAI Applications

    Discover RAGbits, an open-source framework designed to accelerate the development of reliable and scalable Generative AI applications. This innovative toolkit provides modular components for building sophisticated RAG (Retrieval-Augmented Generation) pipelines, managing LLMs, and integrating various data sources. Learn how RAGbits simplifies complex tasks like data ingestion, vector store management, and chatbot deployment, enabling developers to create robust AI solutions efficiently. Explore its features, including type-safe LLM calls, extensive format support, and built-in testing tools, to streamline your GenAI projects.

  • Jun 3, 2025

    MinerU: Transform Unstructured Documents into Accessible Knowledge with Cloud-Based Mining

    MinerU: A cloud-based knowledge mining platform that helps you extract insights from documents. Upload files, ask questions, and receive factual answers with citations. Perfect for researchers, professionals, and educators seeking efficient information retrieval.

Curated AI tools, open source projects, tutorials, and resources for developers building with artificial intelligence.

Terms of Service Privacy Policy © 2026 AIBit-Discover Open Source Projects