chunking-algorithm

Here are 25 public repositories matching this topic...

chonkie-inc / chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

ai splitting-algorithms similarity-search chunker rag retrieval-systems chunking-algorithm text-splitter llms chonkie semantic-chunker

Updated Mar 18, 2026
Python

chonkie-inc / chonkiejs

Sponsor

Star

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

typescript ai splitting-algorithms chunker rag retrieval-systems chunking-algorithm text-splitter llms chonkie semantic-chunker

Updated Mar 17, 2026
TypeScript

nlfiedler / fastcdc-rs

Star

FastCDC implementation in Rust

rust deduplication chunking-algorithm

Updated Feb 21, 2026
Rust

iscc / fastcdc-py

Star

FastCDC implementation in Python https://pypi.org/project/fastcdc/

python chunking deduplication content-dependent chunking-algorithm

Updated Jun 27, 2024
Python

GiovanniPasq / chunky

Star

Your RAG pipeline is broken and you don't know it. Chunky lets you validate your Markdown and choose the best chunking strategy before indexing.

Updated Mar 21, 2026
TypeScript

SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.

nlp cli package semantic pip chunking rag chunking-algorithm llm agentic-workflow

Updated Feb 6, 2026
Python

mg98 / ae-chunker-go

Star

Go implementation of the AE chunking algorithm.

go golang chunking chunking-algorithm

Updated Jan 4, 2023
Go

gidea / chunkpad

Star

Chunkpad is designed to prepare documents for Retrieval-Augmented Generation (RAG) pipelines and AI applications.

text-editor chunking-algorithm vector-database rag-pipeline

Updated Nov 30, 2025
TypeScript

FastPix / android-uploads-sdk

Star

Android Resumable Uploads SDK from Fastpix

android kotlin java retrofit2 resumable-upload chunking-algorithm

Updated Nov 27, 2025
Kotlin

arcadiasofts / clast-rs

Star

A Rust library for Content-Defined Chunking (CDC).

rust-library chunking-algorithm content-defined-chunking

Updated Jan 6, 2026
Rust

mahnoorsheikh16 / NLP-Framework-for-Literature-Summarization-in-Law-and-Policy

Star

Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, hierarchical chunking), extractive TF-IDF baselines, and fine-tuned abstractive models (DistilBART, LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity metrics.

led text-summarization cosine-similarity rouge-metric nlp-keywords-extraction policy-analysis tokenization bleu-score encoder-decoder-model retrieval-chatbot rag chunking-algorithm longformer-models distilbart rag-chatbot qa-reterival

Updated Jan 16, 2026
Jupyter Notebook

isaka-james / chunks-to-file

Star

A nodejs chunking system

nodejs chunk chunking chunked-uploads chunks chunking-algorithm chunking-files nodejs-chunking node-chunking

Updated Sep 26, 2024
JavaScript

Fallen-Breath / pyfastcdc

Star

A high-performance FastCDC 2020 implementation written in Python + Cython

python deduplication chunking-algorithm fastcdc

Updated Feb 19, 2026
Python

i5heu / ChunkingChampions

Star

Explore and benchmark the world of data chunking algorithms in 'ChunkingChampions' - a competitive arena to determine the most efficient and effective chunking strategies for varied data sizes.

benchmark ranking chunking chunking-algorithm