#nlp

  1. fancy-regex

    regexes, supporting a relatively rich set of features, including backreferences and look-around

    v0.17.0 11.3M #regex-engine #backreferences #look-around #expression #analysis #nlp #regex-matching #text-pattern-matching #oniguruma #backtracking
  2. stop-words

    Common stop words in many languages

    v0.10.0 549K #nlp #localization #language
  3. whatlang

    Fast and lightweight language identification library for Rust

    v0.18.0 134K #nlp #language-text #lang
  4. apalis-cron

    extensible library for scheduling recurring tasks in rust

    v1.0.0-rc.7 23K #cron-job #job-scheduling #apalis #cron-job-scheduling #cron-scheduling #nlp #cron-expression
  5. textsurf

    Webservice for efficiently serving multiple plain text documents or excerpts thereof (by unicode character offset), without everything into memory

    v0.7.0 #nlp #annotations #standoff #text
  6. markdown_timesheet

    processing markdown files to extract and format timesheet data

    v0.8.0 310 #time-tracking #markdown #timesheet #nlp #entries #status-bar #case-insensitive #weekday #today #summary
  7. google-language1

    A complete library to interact with Cloud Natural Language (protocol v1)

    v7.0.0+20251219 #nlp #google-cloud #google-api #google-cloud-api
  8. qdrant-rust-stemmers

    some popular snowball stemming algorithms

    v1.2.2 7.7K #information-retrieval #stemming #nlp
  9. google-language1_beta1

    A complete library to interact with Cloud Natural Language (protocol v1beta1)

    v7.0.0+20220218 #nlp #google-cloud #google-api #google-cloud-api
  10. deformat

    Extract plain text from HTML, PDF, and other document formats

    v0.6.0 1.2K #html #pdf #nlp #text-html #text-extraction
  11. ripopt

    A memory-safe interior point optimizer in Rust

    v0.6.1 #linear-solver #hessian #nlp #barrier #line-search #lbfgs #ipopt #factorization #memory-safe #restoration
  12. kalosm-sample

    A common interface for token sampling and helpers for structered llm sampling

    v0.4.1 1.6K #artificial-intelligence #llama #nlp #mistral #llm
  13. ck-embed

    Text embedding providers for ck semantic search

    v0.7.2 180 #semantic-search #nlp #embedding
  14. english-to-cron

    converts natural language into cron expressions

    v0.1.7 21K #cron-expression #cron-job #convert #nlp #range #expression-language #pm #24-hour
  15. computer-says-no

    Local embedding service for text classification using ONNX models

    v0.2.3 #onnx #classification #nlp
  16. axonml

    A complete ML/AI framework in pure Rust - PyTorch-equivalent functionality

    v0.6.0 #onnx #artificial-intelligence #nlp #deep-learning #dataset #neural-network #distributed #pytorch #llm #audio
  17. astorion

    A Duckling-inspired, rule-based entity parsing engine in Rust, designed for extensible time and numeral parsing using a saturation-style pipeline

    v0.4.2 #rule-based #ast #saturation #nlp #parser
  18. gline-rs

    Inference engine for GLiNER models

    v1.0.1 3.2K #nlp #ner #g-li-ner
  19. popsam-cli

    CLI for AI-assisted selection of semantically representative texts

    v0.1.0 #embedding #nlp #feedback #sampling
  20. mmd-mpl

    MPL is a rule-based Domain-Specific Language for creating MMD poses and animations using natural semantic syntax

    v0.3.4 850 #mpl #domain-specific-language #mmd #animation #semantic #nlp #artificial-intelligence #motion-animation #pose #poses
  21. two_timer

    parser for English time expressions

    v2.2.5 380 #nlp #parse-time #time
  22. you

    Translate your natural language into executable command(s)

    v0.1.90 1.4K #artificial-intelligence #translation #unix #shell #nlp #interactive-cli #openai #llm #explanation #command-line-interface
  23. bareun_rs

    an unofficial Rust library for Bareun, a Korean morphological analyzer

    v1.7.2 #korean #nlp #analyzer #api-bindings
  24. crfsuite-compliant-rs

    Pure Rust implementation of CRFsuite (Conditional Random Fields) for labeling sequential data

    v0.2.1 #crf #machine-learning #nlp #sequence-labeling
  25. sastrawi-rs

    High-performance Indonesian stemmer (Nazief-Adriani + ECS). Zero-regex, FST-powered, Rust 2024.

    v0.5.1 #stemming #nlp #indonesian #bahasa
  26. wicket

    Wikipedia corpus knowledge extractor

    v0.1.1 #nlp #wikipedia #processing
  27. wikiext

    extracting and processing Wikipedia data, implemented in Rust

    v0.1.1 #nlp #processing #extract #wikipedia
  28. riptoken

    Fast BPE tokenizer for LLMs — a faster, drop-in compatible reimplementation of tiktoken

    v0.2.3 #tiktoken #bpe #llm #nlp #tokenizer
  29. kiwi-rs

    Ergonomic Rust bindings for the Kiwi Korean morphological analyzer C API

    v0.1.4 130 #tokenize #nlp #korean #morphology #api-bindings #tokenizer
  30. rustling

    A blazingly fast library for computational linguistics

    v0.8.0 1.4K #nlp #linguistics #text-processing
  31. nattydate

    Lightweight, deterministic natural language date/time preprocessor — no ML, no clock fragility

    v0.2.0 #date-parser #date-time #testing #pre-processor #natural-language #ml #string-format #date-time-parser #canonical #classify
  32. tokie

    Blazingly fast tokenizer - 50x faster tokenization, 10x smaller model files, 100% accurate drop-in replacement for HuggingFace

    v0.0.8 #bpe #tokenize #word-piece #nlp #transformer #tokenizer
  33. wordvec

    A compact SmallVec<T>-like container with only align_of::<T>() overhead for small stack-only instances

    v0.2.0 140 #small-vec #instance #alignment #usize #component #stack-only #memory-layout #memory-footprint #ecs #nlp
  34. bm25_turbo

    The fastest BM25 information retrieval engine — 28K QPS on 8.8M docs

    v0.1.0 #bm25 #nlp #search #ranking
  35. writing-analysis

    Lightweight writing analysis and NLP tools for Rust

    v0.1.1 #sentiment-analysis #nlp #readability #analysis #writing
  36. haqumei-cli

    Command-line interface for the Haqumei G2P (Grapheme-to-Phoneme) engine

    v0.1.0 #nlp #g2p #phoneme #japanese
  37. clockwords

    Find and resolve natural-language time expressions across multiple languages

    v0.4.0 #nlp #multilingual #chrono #parser
  38. agentzero-plugin-sdk

    Plugin SDK for building AgentZero WASM plugins

    v0.10.0 #artificial-intelligence #agentzero #wasm-plugin #encryption #agent-zero #ai-agents #sdk-for-building #plugin-sdk #encrypted-storage #nlp
  39. haqumei

    Japanese Grapheme-to-Phoneme (G2P) library implemented in Rust

    v0.3.0 #phoneme #japanese #g2p #nlp
  40. phonetik

    Phonetic analysis engine for English. Rhyme detection, stress scanning, meter analysis, and syllable counting with a 126K-word embedded dictionary.

    v0.3.2 #nlp #phonetic #rhyme #linguistics #prosody
  41. cronify

    convert natural language time expressions into cron syntax

    v1.0.0 #cron #time #nlp #time-parser #scheduling
  42. normy

    Ultra-fast, zero-copy text normalization for Rust NLP pipelines & tokenizers

    v0.1.4 #nlp #zero-copy #llm #normalization #tokenizer
  43. corpa

    The ripgrep of text analysis. Blazing-fast CLI for corpus-level NLP statistics.

    v0.4.11 #nlp #corpus #linguistics #text-analysis
  44. model2vec-rs

    Official Rust Implementation of Model2Vec

    v0.1.4 3.1K #nlp #embedding #rust
  45. pretokie

    Fast, zero-allocation pretokenizers for BPE tokenizers

    v0.0.4 #bpe #nlp #pretokenizer #transformer #tokenizer
  46. mongodb-voyageai

    A client for generating embeddings and reranking with Voyage AI

    v0.1.3 #rerank #embedding #voyageai #nlp
  47. wikipedia-article-transform

    Transform Wikipedia articles in html to plaintext and markdown formats

    v0.4.0 #tree-sitter #html #nlp #text-extraction
  48. opencc-jieba-rs

    High-performance Chinese text conversion and segmentation using Jieba and OpenCC-style dictionaries

    v0.7.4 7.4K #opencc #nlp #chinese #jieba
  49. todoist-api-rs

    Todoist API client library

    v0.2.0 #task #due-date #todoist #api-token #list #sync-api #today #api-client #td #nlp
  50. langextract-rust

    extracting structured and grounded information from text using LLMs

    v0.5.0 270 #llm #nlp #structured-data #extract
  51. anno

    Named entity recognition, coreference resolution, and zero-shot entity types

    v0.5.0 #nlp #extract #ner #zero-shot #coreference
  52. kiru

    Fast text chunking for Rust

    v0.1.11 230 #nlp #chunking #rag #text
  53. yaak

    Translate natural language to bash commands using an OpenAI-compatible LLM

    v0.1.3 #artificial-intelligence #open-ai-compatible #nlp #bash #translation #llm #ai-api #command-history #config-file #explain
  54. chunk

    The fastest semantic text chunking library — up to 1TB/s chunking throughput

    v0.10.1 600 #chunking #simd #nlp
  55. embellama

    High-performance Rust library for generating text embeddings using llama-cpp

    v0.10.1 #llama #nlp #machine-learning #api-bindings
  56. idoit

    AI-powered command line simplifier — do it!

    v0.2.1 #nlp #shell #llm
  57. jon

    Natural language interface for Joy and Jot - CLI, TUI, and desktop app

    v0.1.3 #nlp #language-interface #tui #desktop #jot #joy
  58. hypembed

    Pure-Rust BERT-compatible text embedding inference for local-first applications

    v0.2.1 #inference #nlp #bert #transformer
  59. unimorph

    Command-line interface for UniMorph morphological data

    v0.2.1 #nlp #linguistics #morphology
  60. aprender-rag

    Pure-Rust Retrieval-Augmented Generation pipeline built on Trueno

    v0.29.0 #vector-search #rag #nlp #embedding #vector-embedding
  61. duckling

    port of Facebook's Duckling library for parsing natural language into structured data

    v0.3.0 #nlp #parser #time-parser #time
  62. gibberish-or-not

    Figure out if text is gibberish or not

    v5.0.7 140 #classification #gibberish #ngrams #medium #password #sensitivity-level #transition #nlp #text-classification
  63. embedd

    Embedding interfaces + local backends (Candle/HF)

    v0.1.0 #candle #inference #embedding #nlp
  64. opencc-fmmseg

    High-performance Chinese conversion library (Simplified ↔ Traditional) using OpenCC lexicons and FMM segmentation — no runtime I/O, cross-platform, and production-ready

    v0.9.1 #nlp #opencc #unicode #localization #chinese
  65. kawat

    Web content extraction library inspired by trafilatura. Extracts main text, metadata, and comments from HTML.

    v0.1.3 #web-scraping #text-extraction #nlp #html #boilerplate-removal
  66. wicket-cli

    Wikipedia corpus knowledge extractor

    v0.1.1 #nlp #extract #wikipedia #data #processing
  67. sai-cli

    ('sai') — Tell the shell what you want, not how to do it. Natural-language to safe shell command generator.

    v1.0.0 #artificial-intelligence #prompt #generator #explain #safe #llm #command-history #logging #system-prompt #nlp
  68. textalyzer

    Analyze key metrics like number of words, readability, and complexity of any kind of text

    v0.5.0 280 #nlp #metrics #analysis
  69. unimorph-cli

    Command-line interface for UniMorph morphological data

    v0.1.3 #nlp #linguistics #morphology
  70. intentdb

    Schema-free, natural language storage engine

    v0.1.0 #vector-search #vector-database #nlp #database #embedding #vector-embedding
  71. markovify-rs

    A fast, extensible Rust implementation of a Markov chain text generator, inspired by markovify

    v0.1.2 #markov-chain #generator #nlp #text-generator
  72. rosetta-aisp

    Bidirectional prose ↔ AISP symbolic notation conversion based on the Rosetta Stone mappings

    v0.2.0 #symbolic #aisp #nlp #convert #conversion
  73. udpipe-rs

    Rust bindings for UDPipe - a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files

    v0.2.0 #nlp #dependency-parsing #pos-tagging #parser
  74. lingua-latvian-language-model

    The Latvian language model for Lingua, an accurate natural language detection library

    v1.3.0 274K #language-recognition #language-detection #nlp
  75. ynab-mcp

    Model Context Protocol server for YNAB (You Need A Budget)

    v0.1.6 #model-context-protocol #ynab #mcp-server #budget #api-token #nlp #secure-token #keyring #account-balance #artificial-intelligence
  76. whichtime-cli

    Command-line interface for parsing natural language dates

    v0.1.0 #date-parser #cli-parser #nlp #date-time #date-time-parser
  77. pii-masker

    Rust port of the HydroXai PII masker with a library API and CLI

    v0.1.0 #pii #nlp #privacy #redaction #deberta
  78. lingua-japanese-language-model

    The Japanese language model for Lingua, an accurate natural language detection library

    v1.3.0 281K #language-recognition #language-detection #nlp
  79. attuned-infer

    Fast, transparent inference of human state axes from natural language

    v1.0.1 #nlp #state #llm #llm-inference #context
  80. nanofts

    High-performance full-text search engine in Rust

    v0.7.0 230 #search-engine #full-text-search #full-text-search-engine #lsm-tree #nlp #roaring-bitmaps
  81. langmail

    Email preprocessing for LLMs

    v0.6.0 #pre-processor #llm #mime #nlp
  82. lingua-swedish-language-model

    The Swedish language model for Lingua, an accurate natural language detection library

    v1.3.0 281K #language-recognition #language-detection #nlp
  83. isu

    Information State Update theory, applicable in Issue-Based Dialogue Management and Conversational Agent Architecture

    v0.1.6 #ai-agent #nlp #dialogue
  84. lingua-czech-language-model

    The Czech language model for Lingua, an accurate natural language detection library

    v1.3.0 281K #language-recognition #language-detection #nlp
  85. instant-segment

    Fast English word segmentation

    v0.11.1 #word-segmentation #english-words #search #segment #instant #nlp #word-search
  86. lingua-hindi-language-model

    The Hindi language model for Lingua, an accurate natural language detection library

    v1.3.0 280K #language-recognition #language-detection #nlp
  87. ticktickrs

    A CLI Tool for TickTick tasks

    v0.1.4 #command-line-tool #task-list #tick-tick #tickrs #task-id #date-parser #oauth #json-output #nlp #authentication
  88. lingua-irish-language-model

    The Irish language model for Lingua, an accurate natural language detection library

    v1.3.0 276K #language-recognition #language-detection #nlp
  89. cjclassifier

    Classify ideograph text as Chinese Simplified, Chinese Traditional, or Japanese using a statistical model

    v0.1.0 #japanese #language-detection #cjk #nlp #chinese
  90. lingua-bulgarian-language-model

    The Bulgarian language model for Lingua, an accurate natural language detection library

    v1.3.0 279K #language-model #language-recognition #language-detection #nlp
  91. langidentify-models-lite

    Lite embedded model data for the langidentify language detection library

    v0.1.0 #data-model #language-detection #model-data #ngrams #nlp
  92. lingua-hungarian-language-model

    The Hungarian language model for Lingua, an accurate natural language detection library

    v1.3.0 276K #language-recognition #language-detection #nlp
  93. lingua-serbian-language-model

    The Serbian language model for Lingua, an accurate natural language detection library

    v1.3.0 279K #language-recognition #language-detection #nlp
  94. lingua-bosnian-language-model

    The Bosnian language model for Lingua, an accurate natural language detection library

    v1.3.0 274K #language-recognition #language-detection #nlp
  95. memchunk

    The fastest semantic text chunking library — up to 1TB/s chunking throughput

    v0.4.0 #chunking #simd #nlp
  96. lingua-tagalog-language-model

    The Tagalog language model for Lingua, an accurate natural language detection library

    v1.3.0 291K #language-recognition #language-detection #nlp
  97. wg-ragsmith

    Semantic chunking and RAG utilities for document processing and retrieval-augmented generation

    v0.1.5 #chunking #rag #vector-search #nlp #semantic-chunking
  98. lingua-afrikaans-language-model

    The Afrikaans language model for Lingua, an accurate natural language detection library

    v1.3.0 276K #language-recognition #nlp #language-detection
  99. pgf2json

    Application Programming Interface to load and interpret grammars compiled in Portable Grammar Format (PGF). The PGF format is produced as a final output from the GF compiler. The library…

    v0.2.5 230 #grammar #pgf #gf #nlp #grammars
  100. lingua-thai-language-model

    The Thai language model for Lingua, an accurate natural language detection library

    v1.3.0 279K #language-recognition #language-detection #nlp
  101. rus-torch

    A comprehensive deep learning framework in Rust, merging core, nn, vision, text, and wasm

    v0.1.0 #deep-learning #computer-vision #nlp #neural-network #machine-learning
  102. lingua-tamil-language-model

    The Tamil language model for Lingua, an accurate natural language detection library

    v1.3.0 279K #language-recognition #language-detection #nlp
  103. lingua-yoruba-language-model

    The Yoruba language model for Lingua, an accurate natural language detection library

    v1.3.0 298K #language-recognition #language-detection #nlp
  104. lingua-maori-language-model

    The Māori language model for Lingua, an accurate natural language detection library

    v1.3.0 298K #language-recognition #language-detection #nlp
  105. wetext-rs

    Text normalization library for TTS, Rust implementation of WeText

    v0.1.2 #text-normalization #text-to-speech #nlp #fst
  106. lingua-ganda-language-model

    The Ganda language model for Lingua, an accurate natural language detection library

    v1.3.0 298K #language-recognition #language-detection #nlp
  107. lingua-mongolian-language-model

    The Mongolian language model for Lingua, an accurate natural language detection library

    v1.3.0 279K #language-recognition #language-detection #nlp
  108. lingua-albanian-language-model

    The Albanian language model for Lingua, an accurate natural language detection library

    v1.3.0 279K #language-recognition #language-detection #nlp
  109. bm-25

    BM25 embedder, scorer, and search engine

    v3.0.0 260 #nlp #embed #search #sparse
  110. lingua-danish-language-model

    The Danish language model for Lingua, an accurate natural language detection library

    v1.3.0 276K #language-recognition #language-detection #nlp
  111. lingua-romanian-language-model

    The Romanian language model for Lingua, an accurate natural language detection library

    v1.3.0 274K #language-recognition #language-detection #nlp
  112. lingua-persian-language-model

    The Persian language model for Lingua, an accurate natural language detection library

    v1.3.0 274K #language-recognition #language-detection #nlp
  113. lingua-catalan-language-model

    The Catalan language model for Lingua, an accurate natural language detection library

    v1.3.0 275K #language-recognition #language-detection #nlp
  114. lingua-welsh-language-model

    The Welsh language model for Lingua, an accurate natural language detection library

    v1.3.0 276K #language-recognition #language-detection #nlp
  115. lingua-german-language-model

    The German language model for Lingua, an accurate natural language detection library

    v1.3.0 377K #language-recognition #language-detection #nlp
  116. lingua-portuguese-language-model

    The Portuguese language model for Lingua, an accurate natural language detection library

    v1.3.0 361K #language-recognition #language-detection #nlp
  117. lingua-icelandic-language-model

    The Icelandic language model for Lingua, an accurate natural language detection library

    v1.3.0 332K #language-recognition #language-detection #nlp
  118. lingua-french-language-model

    The French language model for Lingua, an accurate natural language detection library

    v1.3.0 313K #language-recognition #language-detection #nlp
  119. mecrab

    A high-performance, thread-safe morphological analyzer compatible with MeCab, written in pure Rust

    v0.1.0 #japanese #morphology #nlp #mecab
  120. lingua-tswana-language-model

    The Tswana language model for Lingua, an accurate natural language detection library

    v1.3.0 351K #language-recognition #language-detection #nlp
  121. lingua-marathi-language-model

    The Marathi language model for Lingua, an accurate natural language detection library

    v1.3.0 340K #language-recognition #language-detection #nlp
  122. lingua-sotho-language-model

    The Sotho language model for Lingua, an accurate natural language detection library

    v1.3.0 329K #language-recognition #language-detection #nlp
  123. newsfresh

    CLI and library for querying, filtering, and analyzing GDELT Global Knowledge Graph (GKG) v2.1 data — the world's largest open news event dataset

    v0.1.0-beta.2 #nlp #gkg #gdelt #geopolitics #news
  124. ai-translator

    基于 AI 的多语言文本翻译工具,支持自定义提示词

    v0.1.1 #translator #nlp #translation
  125. nodedb-document

    Shared document engine (text analysis, BM25, inverted index) for NodeDB Origin and Lite

    v0.0.0-beta.1 #database #engine #bm25 #nlp #origin #inverted-index #hybrid #distributed #time-series #multimodal
  126. umsc

    Uyghur multi-script converter for Arabic, Latin, Yengi, Cyrillic, XJUS, and Uzbek Latin scripts

    v1.0.0 #nlp #transliteration #uyghur #unicode
  127. mecha10-nodes-llm-command

    Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama)

    v0.1.39 #artificial-intelligence #command-parser #openai #node #llm #claude #motor #mecha10 #ollama #nlp
  128. wikiext-cli

    Wikiext is a tool for extracting and processing Wikipedia data, implemented in Rust

    v0.1.1 #extract #nlp #wikipedia #data #processing
  129. trustformers

    port of Hugging Face Transformers

    v0.1.0-alpha.1 #nlp #bert #machine-learning #transformer
  130. llm-text

    processing text for LLM consumption

    v0.1.0 #nlp #llm #text-extraction #html #text-html
  131. ctranslate2-server

    A high-performance inference server for CTranslate2 models, compatible with OpenAI's API

    v0.1.2 #openai #ctranslate2 #inference-server #nlp #inference
  132. llm_utils

    The best possible text chunker and text splitter and other text tools

    v0.0.11 750 #nlp #llm #encoding
  133. mecab-ko

    한국어 형태소 분석기 - MeCab-Ko의 순수 Rust 구현

    v0.6.0 #tokenize #nlp #korean #mecab #morphology #tokenizer
  134. lingua-turkish-language-model

    The Turkish language model for Lingua, an accurate natural language detection library

    v1.2.0 177K #language-recognition #language-detection #nlp
  135. cro_stem

    A lightning-fast, zero-dependency Croatian stemming library written in Rust

    v0.1.7-rc.2 #stemming #nlp #python #croatian #rust
  136. natural

    Pure rust library for natural language processing

    v0.5.0 5.7K #natural-language-processing #nlp #soundex #tf-idf #levenshtein-distance #naive-bayes #ngrams #jaro-winkler #phonetic #classification
  137. pii

    PII detection and anonymization with deterministic, capability-aware NLP pipelines

    v0.1.0 #anonymization #privacy #redaction #nlp
  138. langdetect-rs

    Language detection in Rust. Port of Mimino666's langdetect.

    v0.2.3 #nlp #language-id #text-processing
  139. reinfer-client

    API client for Re:infer, the conversational data intelligence platform

    v0.38.19 #api-client #nlp #re-infer #conversational #platform
  140. wideword

    Fast word-length bucketing for text documents using SIMD

    v0.1.0 #word-length #simd #nlp #bucketing
  141. ragrep

    A fast, natural language code search tool

    v0.2.0 #search-query #codebase #semantic-search #search-server #embedding #cache #tree-sitter #nlp
  142. legalis

    Command-line interface for Legalis-RS

    v0.1.4 #command-line-interface #verification #statutes #diff #dsl #shell-completion #viz #visualization #nlp
  143. pdfvec

    High-performance PDF text extraction library for vectorization pipelines

    v0.1.1 #pdf #vectorization #nlp #text-extraction
  144. lingua-kazakh-language-model

    The Kazakh language model for Lingua, an accurate natural language detection library

    v1.2.0 109K #language-recognition #language-detection #nlp
  145. a3s-cron

    Cron scheduling library for A3S with natural language support

    v0.1.2 140 #cron #nlp #a3s #cron-scheduler #task
  146. budouy

    Rust port of BudouX with optional HTML processing and CLI

    v0.2.1 #nlp #line-break #text-segmentation #budoux
  147. gitctx

    MCP server for GitHub repository exploration

    v0.1.0 #github-repository #mcp-server #rate-limiting #mcp-client #pull-request #github-api #exploration #claude #api-request #nlp
  148. lingua-vietnamese-language-model

    The Vietnamese language model for Lingua, an accurate natural language detection library

    v1.2.0 73K #language-recognition #language-detection #nlp
  149. semantic-commands

    A lightweight Rust framework for defining and executing semantic commands using text embeddings

    v0.1.1 #bot #nlp #semantic #embedding #command
  150. date_time_parser

    Rust NLP library for parsing English natural language into dates and times

    v0.2.0 2.3K #date-parser #nlp #date-time #english #times #ical #dates-and-times
  151. fibpetokenizer

    A blazing fast Byte Pair Encoding (BPE) tokenizer library with Python bindings

    v0.1.0 #nlp #bpe #machine-learning #bpe-tokenizer #tokenizer
  152. lingua-slovene-language-model

    The Slovene language model for Lingua, an accurate natural language detection library

    v1.2.0 73K #language-recognition #language-detection #nlp
  153. wordcutw

    A C-interface wrapper for Wordcut - a Lao/Thai word segmentation/breaking library

    v0.5.0 250 #thai #nlp #lao #c
  154. lingua-slovak-language-model

    The Slovak language model for Lingua, an accurate natural language detection library

    v1.2.0 60K #language-recognition #nlp #language-detection
  155. commit_crafter

    AI powered tool for Git commit message generator

    v0.1.6 360 #git-commit #commit-message #git #nlp #productivity #ai
  156. ds-r1-rs

    A DeepSeek R1-inspired reasoning model prototype in Rust

    v0.1.1 #deepseek #nlp #ml #reasoning #transformer
  157. bayesian

    A naive Bayesian classifier with optional TF-IDF support

    v0.3.0 #naive-bayes #naive-bayesian #nlp #machine-learning
  158. mecab-ko-hangul

    한글 처리 유틸리티 - 자모 분리/결합, 음절 처리, 정규화

    v0.6.0 #hangul #nlp #korean #jamo #unicode
  159. fast-bpe-rs

    Fast Byte Pair Encoding (BPE) tokenizer with Python bindings powered by PyO3

    v0.6.1 #bpe #pyo3 #nlp #python
  160. flerp

    CLI tool that does XYZ

    v0.5.0 #tui #text-search #nlp #terminal-text #analysis #statistics #case-sensitive #xyz #tui-text #text-viewer
  161. edgebert

    Fast local text embeddings library for Rust and WASM for BERT inference on native and edge devices with no dependencies

    v0.4.0 #bert #nlp #transformer #wasm
  162. hy-mt

    A lightweight machine translation inference library for Tencent Hunyuan MT models

    v0.1.0 #machine-translation #candle #hunyuan #nlp #rust
  163. embedcache

    High-performance text embedding service with caching capabilities

    v0.1.1 #vector-search #nlp #search
  164. cali

    A terminal calculator with real-time evaluation, unit conversions, and natural language expressions

    v0.9.0 750 #unit-conversion #expression-language #real-time #terminal-calculator #nlp #calculator-with-real-time #currency #date
  165. nysiis

    A fast NYSIIS (New York State Identification and Intelligence System) phonetic encoding library

    v1.0.1 #soundex #phonetic #nlp #encoding
  166. avila-tokenizers

    The most complete tokenizer library in Rust - BPE, WordPiece, Unigram, with native support for GPT, BERT, Llama, Claude

    v0.1.0 #tokenize #bert #llm #nlp #gpt #tokenizer
  167. thulp-query

    Query engine for searching and filtering thulp tools

    v0.3.1 #query-engine #search-engine #thulp #filtering #criteria #query-builder #wildcard #nlp #artificial-intelligence
  168. waken_snowball

    Snowball stemming algorithms for 33 languages

    v0.1.0 200 #stemming #snowball #nlp #linguistics
  169. anno-metrics

    Shared evaluation/analysis primitives for anno (metrics + cluster encoders)

    v0.5.0 #evaluation-metrics #nlp #coreference #evaluation
  170. natural-date-rs

    A parser to convert natural language date and time specifications into DateTime

    v0.3.0 #date-parser #date-time #specification-language #convert #relative-time #expression-parser #nlp #tomorrow #monday #pm
  171. kizame

    (刻め!) - CLI for MeCrab morphological analyzer and data pipeline

    v0.1.0 #japanese #morphology #nlp #mecab
  172. tessera-embeddings

    Multi-paradigm embedding library: ColBERT, dense, sparse, vision-language, and time series models

    v0.1.0 #semantic-search #embedding #ml #nlp #colbert
  173. cairn-extract

    Rule-based claim extraction from markdown with confidence scoring

    v0.1.0 #markdown #claim #nlp #confidence #extract #claims
  174. sdaas-rs

    Official Rust SDK for SDaaS — Semantic Delta as a Service

    v0.1.0 #semantic #nlp #embedding
  175. treebender

    An HDPSG inspired symbolic NLP library for Rust

    v0.1.1 #nlp #earley #hdpsg #syntax #earley-parser
  176. langid-rs

    A fast and lightweight language identification library in Rust, inspired by py3langid

    v1.1.0 380 #nlp #language-detection #py3langid
  177. slabs

    Text chunking for RAG: fixed, sentence, recursive, and semantic strategies

    v0.1.4 #nlp #chunking #rag #text-segmentation
  178. ayumu

    A small, lightweight, user-oriented query language for search forms

    v0.3.6 600 #query-language #forms #nlp #user-oriented #sorting #terms #white-space #fault-tolerant
  179. jon-cli

    Natural language interface for Joy and Jot - CLI for the Joyint ecosystem

    v0.1.3 #language-interface #ecosystem #jot #joy #nlp
  180. mathsys

    The Natural Language of Math

    v9.0.0 #math #nlp #dsl #latex
  181. whichtime

    High-level Rust API for natural language date parsing

    v0.1.0 #date-parser #nlp #date-time #chrono-node #date-time-parser
  182. xase-sidecar

    XASE AI Sidecar: high-performance evidence and data processing sidecar (audio/image/DICOM/NLP) with S3, Redis, JWT auth, and metrics

    v0.1.0 #dicom #sidecar #nlp #xase
  183. intent-gen

    Natural language to IntentLang spec generation via LLM (Layer 0)

    v0.7.0-beta.1 #nlp #dsl #llm
  184. lingua-belarusian-language-model

    The Belarusian language model for Lingua, an accurate natural language detection library

    v1.2.0 39K #language-recognition #language-detection #nlp
  185. popsam-py

    Python extension crate for AI-assisted selection of semantically representative texts

    v0.1.0 #embedding #python #nlp #feedback #sampling
  186. kalosm-learning

    A simplified machine learning library for building off of pretrained models

    v0.4.0 450 #artificial-intelligence #bert #machine-learning #nlp #transformer
  187. whichtime-sys

    Lower-level parsing engine for natural language date parsing

    v0.1.0 #date-parser #nlp #date-time #performance #date-time-parser
  188. popsam-core

    Core library for AI-assisted selection of semantically representative texts

    v0.1.0 #nlp #semantic-search #embedding #feedback #sampling
  189. tekken-rs

    Mistral Tekken tokenizer with audio support

    v0.1.1 140 #tokenize #artificial-intelligence #mistral #nlp #audio #tokenizer
  190. vn-nlp

    Vietnamese NLP library — tokenization, normalization, segmentation

    v0.1.3 #nlp #tokenize #vietnamese #linguistics
  191. almanaculum

    Core types and traits for analysis

    v0.1.1 #nlp #linguistics #poetry #greek
  192. legalis-llm

    LLM integration layer for Legalis-RS

    v0.1.4 #artificial-intelligence #model-name #llm #document #law #model-provider #legalis-rs #generate-text #mocking #nlp
  193. mecab-ko-dict-builder

    한국어 형태소 사전 빌더 - CSV에서 바이너리 사전 생성

    v0.6.0 #nlp #korean #mecab #dictionary #builder
  194. remindee-parser

    Natural language reminder parser for remindee-bot

    v0.3.0 #telegram-bot #remindee-bot #reminder #parser #nlp #cron #docker
  195. mecab-ko-dict-validator

    한국어 형태소 사전 검증 도구 - CSV 형식 검증, 품사 체계 검사

    v0.6.0 #validation #korean #nlp #dictionary #mecab
  196. vader_sentiment

    Bindings for Rust from the original Python VaderSentiment analysis tool

    v0.1.1 2.1K #sentiment-analysis #python #vader #lexicon #original #emoji #caps #nlp #social-media #understanding
  197. mecab-ko-dict

    한국어 형태소 사전 관리 - 바이너리 포맷, FST 검색, 연접 비용

    v0.6.0 #fst #nlp #korean #morpheme #dictionary
  198. kalosm-model-types

    Shared types for Kalosm models

    v0.4.0 550 #llama #llm #mistral #nlp
  199. rs-jptxt2tokens

    wrapper to convert the jp txt to tokens

    v0.1.0 #japanese #nlp #wrapper #tokenizer
  200. nlcep

    parsing natural language calendar events

    v0.9.0 #calendar #calendar-events #nlp #parser #tomorrow #machine-readable #string-parser
  201. nlsd

    Natural Language Structured Documents

    v0.1.3 #nlp #parser #serde
  202. langidentify-models-lite-a

    Lite embedded model data (part A: European Latin) for the langidentify language detection library

    v0.1.0 #data-model #language-detection #model-data #ngrams #nlp
  203. langidentify-models-lite-b

    Lite embedded model data (part B: other scripts) for the langidentify language detection library

    v0.1.0 #data-model #language-detection #model-data #ngrams #nlp
  204. reggy

    friendly, resumable regular expressions for text analytics

    v0.0.6 220 #regex #nlp #stream #search #analytics #case-sensitive #unicode #pattern-match #ascii
  205. unitoken

    Fast BPE tokenizer/trainer with a Rust core and Python bindings

    v0.1.1 #tokenize #bpe #nlp #tokenizer
  206. flash_rerank

    Core reranking engine — cross-encoder and ColBERT inference via ONNX Runtime

    v0.1.1 #reranking #semantic-search #cross-encoder #nlp #search
  207. amdm

    Rust client for amdm.ru with Russian lyrics stress marking and meter analysis

    v0.1.0 #russian #lyrics #chord #stress #nlp
  208. byteforge

    A next-generation byte-level transformer with multi-signal patching and SIMD optimization

    v0.1.1 #byte-level #nlp #transformer #simd-performance #simd
  209. aistack

    Functional text-to-function AI utilities

    v0.1.0 #function-calling #nlp #llm #llm-function-calling
  210. rust-chatgpt

    OpenAI API Client for Rust

    v0.1.2 #image #chat-completion #audio #interact #embedding #openai-api #nlp
  211. ragegun

    Performs lexica based analysis on text (i.e. age, gender, PERMA, OCEAN personality traits, ..)

    v0.2.2 #gender #personality #lexica #nlp #analysis #emotions #ocean #lexicon #temporal #intensity
  212. rusty-llm-jury

    CLI tool for estimating success rates when using LLM judges for evaluation

    v0.1.0 #llm #judge #nlp #evaluation #machine-learning
  213. rust_readability

    A package to assess the complexity of texts using a variety of readability formulas

    v0.2.0 170 #nlp #formula
  214. vader-sentimental

    A faster Rust version from the original Python VaderSentiment analysis tool

    v0.1.2 250 #sentiment-analysis #nlp #text-analysis
  215. vn-nlp-tokenize

    Vietnamese tokenization algorithms for vn-nlp

    v0.1.3 #tokenize #nlp #vietnamese #linguistics
  216. sisu

    working with SISU (Statecharts-based implementation of Information State Update

    v0.0.5 #nlp #ai
  217. str-distance

    Distance metrics to evaluate distances between strings

    v0.1.0 230 #levenshtein #levenshtein-distance #nlp #string-similarity #jaro
  218. oxur-lang

    Oxur language processing: parser, expander, and Core Forms IR

    v0.2.0 #forms #oxur #nlp #expander #parser #macro-expansion #intermediate-representation
  219. rsnltk

    Rust-based Natural Language Toolkit

    v0.1.3 #nlp #stanza #nltk #text-analysis
  220. vn-nlp-segment

    Vietnamese sentence segmentation for vn-nlp

    v0.1.3 #tokenize #nlp #vietnamese #linguistics
  221. vn-nlp-normalize

    Vietnamese text normalization — diacritics, unicode NFC/NFD

    v0.1.3 #nlp #tokenize #vietnamese #linguistics
  222. qtransformers-core

    Quantum-inspired attention mechanisms for transformer models

    v0.1.0 #nlp #machine-learning #quantum
  223. llm-shield-nlp

    Natural language processing utilities for LLM Shield

    v0.1.0 #llm #shield #utilities #nlp #processing
  224. geocoder_nlp

    Rust bindings for geocoder-nlp

    v0.2.1 #geocoding #nlp #postal #location
  225. stylometry-analyzer

    Minimal CLI tool that combines one or more .txt files, extracts user-authored text, and enforces a minimum size. Hash-embeds text chunks and queries a local vector DB to classify writing style…

    v0.1.1 #nlp #embedding #stylometry
  226. Try searching with DuckDuckGo.

  227. mecrab-word2vec

    High-performance Word2Vec implementation with Hogwild! parallelization for MeCrab

    v0.1.0 #japanese #word2vec #embedding #nlp #hogwild
  228. libtqsm

    Sentence segmenter that supports ~300 languages

    v0.6.1 750 #nlp #ml #text
  229. edge-transformers

    wrapper over ONNXRuntime that implements Huggingface's Optimum pipelines for inference and generates bindings for C# and C

    v0.1.2 #nlp #machine-learning #ml
  230. sbert

    Sentence Bert (SBert)

    v0.4.1 #nlp #bert #embedding #transformer
  231. gematria_rs

    Gematria, a traditional Hebrew numerology system

    v0.1.1 #command-line-interface #word #hebrew #numerology #calculations #nlp #processing-analysis
  232. chrono-english

    parses simple English dates, inspired by Linux date command

    v0.1.8 340K #date-parser #date-time #nlp #convert #seconds #dialect #chrono #date-format #american
  233. symbol-map

    Memory-efficient mapping from values to integer identifiers (AKA a lexicon or symbol table), with options for fast bidirectional lookup

    v1.0.2 #symbol-table #value #nlp #symbols #integer #hash-map #memory-efficient #lexicon #bidirectional #owns
  234. wikidump

    parsing Mediawiki XML dumps

    v0.3.1 270 #mediawiki #xml #data-processing #language-analysis #dump #wikipedia #backup #revision #article #nlp
  235. intent-classifier

    A flexible few-shot intent classification library for natural language processing

    v0.1.0 #classification #nlp #machine-learning #few-shot #intent
  236. mecrab-builder

    Semantic dictionary builder for MeCrab - Wikidata/Wikipedia pipeline

    v0.1.0 #wikidata #dictionary #japanese #pipeline #nlp
  237. sagacity

    A Rust-based project for conversing with your codebase and handling codebase contextualization

    v0.1.2 #codebase #conversation #artificial-intelligence #browse #context #nlp #chat #summaries #command-line-interface #interactive-command-line
  238. qsv_vader_sentiment_analysis

    Bindings for Rust from the original Python VaderSentiment analysis tool. Forked for use with qsv.

    v0.2.0 2.2K #sentiment-analysis #python #qsv #lexicon #vader #emoji #caps #nlp #vader-sentiment #social-media
  239. deepfrog

    A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support

    v0.2.1 #nlp #xml #annotations #linguistics
  240. when

    'When' parses natural language date/time and produces computer friendly output structures

    v0.1.4 #date-parser #natural-language #rules #rules-language #structures #past #deadlines #pm #tuesday #web-server
  241. cp-embeddings

    Local embedding generation using GTE-Qwen2-1.5B-instruct via Candle — private, on-device AI inference

    v0.3.1 #artificial-intelligence #candle #nlp #semantic-search
  242. event_parser

    Rust NLP library for parsing English natural language into icalendar events

    v0.1.1 #nlp #date-parser #ical #date-time #english #summary
  243. zoea

    by and for baby Rustaceans. It contains 'easy' buttons for common things like http get requests, key-value database persistence, and Natural Language Processing.

    v0.1.0 #database #key-value #http-request #database-persistence #rustaceans #button #nlp #baby
  244. mcprs

    Model Context Protocol para Rust - Uma biblioteca unificada para comunicação com diferentes LLMs e APIs de IA

    v0.1.0 #deepseek #openai #llm #nlp
  245. timewarp

    NLP library for parsing English and German natural language into dates and times

    v0.4.0 #nlp #human-friendly #time #human
  246. temporis

    Parse natural date strings into valid dates

    v1.0.2 #date-parser #date-format #year #object #expression #nlp #expression-parser #marker #human-friendly
  247. langram_train

    Langram train models

    v0.11.1 #langram #language-model #train #detect #language-detection #nlp