rag‑chunk: CLI Tool to Benchmark and Optimize RAG Chunking

January 16, 2026

Category: Practical Open Source Projects

Tags:

rag‑chunk: CLI Tool to Benchmark and Optimize RAG Chunking

Retrieval‑Augmented Generation (RAG) is becoming a cornerstone of modern NLP pipelines, but the quality of a RAG system heavily depends on how well the source text is split into manageable chunks. Too many tiny fragments and you explode your index; too large and you lose contextual fidelity.

rag‑chunk solves this pain point with a simple command‑line interface that lets you test, benchmark, and compare multiple chunking strategies side‑by‑side. It is written in Python, released under the MIT license, and is available on PyPI so you can drop it into any container or CI workflow with minimal friction.

Core Features

Feature	Description
Multiple Strategies	Fixed‑size (word or token based), Sliding‑Window (context preserving), Paragraph (semantic boundaries), Recursive Character (LangChain integration).
Token‑accurate Splitting	Optional tiktoken support for GPT 3.5 and 4 token limits; choose the model with `--tiktoken-model`.
Recall Evaluation	Supply a JSON test file (`examples/questions.json`) to calculate how many relevant phrases appear in the top‑k retrieved chunks.
Rich CLI Output	Beautiful tables powered by Rich – clear, readable, and exportable.
Export	Save results to JSON, CSV, or table format; chunks can be dumped into a `.chunks/` folder for inspection.
Extensible	Add custom chunking logic in `src.chunker.py` and register it in the `STRATEGIES` dictionary.

Quick Start

Installation

# From PyPI
pip install rag-chunk          # basic
pip install rag-chunk[tiktoken] # with optional tiktoken support

Tip – If you are working inside a virtual environment, ensure tiktoken is installed only when you need token‐exact splitting.

Simple Chunk Generation

rag‑chunk analyze examples/ --strategy paragraph

You’ll get a table showing the number of chunks, average recall (0 for no evaluation), and the directory where the fragments live.

Benchmark All Strategies

rag‑chunk analyze examples/ \n  --strategy all \n  --chunk-size 100 \n  --overlap 20 \n  --output table

The CLI will run four strategies (fixed‑size, sliding‑window, paragraph, and recursive‑character) and report a concise comparison.

Validate with a Test File

rag‑chunk analyze examples/ \n  --strategy all \n  --chunk-size 150 \n  --overlap 30 \n  --test-file examples/questions.json \n  --top-k 3 \n  --output json > results.json

The resulting JSON will contain overall recall per strategy and detailed per‑question metrics.

Choosing the Right Strategy

Strategy	When to Use	Chunk Size Recommendation
Fixed‑size	Uniform latency, baseline comparison	150–250 words (or tokens with `--use‑tiktoken`)
Sliding‑window	Long paragraphs where context bleed matters	120–200 words, 20–30% overlap
Paragraph	Markdown or prose with clear sections	Variable – natural paragraph boundaries
Recursive‑character	Highly semantically rich texts, LangChain integration	As per LangChain defaults, but you can override with `--chunk‑size`

If the avg_recall for a strategy is lower than 0.70, consider tweaking the chunk size, changing the strategy, or adding more overlapping tokens.

Extending rag‑chunk

If you have a proprietary splitting algorithm, you can plug it in:

# src/chunker.py
from typing import List, Dict

def my_custom_chunks(text: str, chunk_size: int, overlap: int) -> List[Dict]:
    chunks = []
    # Your logic here – e.g., split by specific markdown headings
    return chunks

# Register in the global strategies
STRATEGIES = {
    "custom": my_custom_chunks,
    ...
}

Run it via the CLI:

rag‑chunk analyze docs/ --strategy custom --chunk-size 180

Real‑World Use Cases

RAG Model Prototyping – Quickly measure how well your embeddings capture meaningful content.
Production Index Tuning – Reduce the number of chunks to cut down storage while maintaining recall.
Model‑Specific Token Boundary – For GPT‑4 with a 32k token context, generate exactly 512‑token chunks that fit.
Automated CI Checks – Add rag‑chunk as a step in your CI pipeline to flag regressions in chunk quality.

Getting Help & Contributing

Source Code – https://github.com/messkan/rag‑chunk
Documentation – Read the full README at the repo or use rag‑chunk --help.
Issues/PRs – The repo is open for pull requests; feel free to propose new strategies or improve docs.
Community – Reach out on the issues page if you hit a bug or have a feature request.

TL;DR

rag‑chunk is an MIT‑licensed Python CLI that lets you benchmark RAG chunking strategies.*
Install via pip install rag‑chunk[tiktoken].*
Run quick benchmarks with rag‑chunk analyze <folder> --strategy all --chunk-size 150.*
Export results as tables, JSON or CSV.; tweak token accuracy with --use‑tiktoken.*

Take the guesswork out of chunk selection, gain actionable metrics, and accelerate your RAG pipeline development today!",

Original Article: View Original

Share this article