Lib.rs

› Keywords #pdf #nlp #markdown #parser #document-parser #html #text-document

#text-extraction

pdf_oxide

The fastest Rust PDF library with text extraction: 0.8ms mean, 100% pass rate on 3,830 PDFs. 5× faster than pdf_extract, 17× faster than oxidize_pdf. Extract, create, and edit PDFs.

v0.3.21 15K #pdf #pdf-parser #text-extraction #document-parser #pdf-to-markdown #text-document
deformat

Extract plain text from HTML, PDF, and other document formats

v0.6.0 1.2K #html #pdf #nlp #text-html #text-extraction
unpdf

High-performance PDF content extraction to Markdown, text, and JSON

v0.2.4 1.4K #pdf #markdown #text-extraction #document-parser #text-document #pdf-parser
bolivar-cli

PDF text extraction CLI tools

v1.6.1 #pdf #text-extraction #pdfplumber #pdfminer
pdf-text-extract

Extract text, tables, and structured content from PDF files

v0.2.0 #pdf-parser #pdf #markdown #text-extraction #parser
wikipedia-article-transform

Transform Wikipedia articles in html to plaintext and markdown formats

v0.4.0 #tree-sitter #html #nlp #text-extraction
folio-pdf

A comprehensive PDF library for Rust

v0.0.3 #pdf #pdf-parser #text-extraction #document-parser #parser
keyword_extraction

Collection of algorithms for keyword extraction from text

v1.5.0 180 #tf-idf #algorithm #text-extraction
kawat

Web content extraction library inspired by trafilatura. Extracts main text, metadata, and comments from HTML.

v0.1.3 #web-scraping #text-extraction #nlp #html #boilerplate-removal
lopdf-parang

A fork of lopdf optimized for PDF text extraction — lazy streams, O(1) object slicing, zlib-rs

v0.39.1 #pdf #text-extraction #parser
pdfplumber-cli

Command-line tool to extract text, characters, words, and tables from PDF documents

v0.2.0 #pdf #table #text-extraction #cli-table
llm-text

processing text for LLM consumption

v0.1.0 #nlp #llm #text-extraction #html #text-html
justpdf

Pure Rust PDF engine - read, render, extract, create, modify

v0.1.2 #pdf #document #text-extraction #render #parser #graphics
rpdfium

A faithful Rust port of Google's PDFium PDF rendering engine

v7676.6.4 #pdf #text-extraction #pdfium #document
hwarang

Fast HWP document text extractor

v0.2.0 #hwp #hwpx #text-extraction #korean #hancom
pdfvec

High-performance PDF text extraction library for vectorization pipelines

v0.1.1 #pdf #vectorization #nlp #text-extraction
pdfplumber

Extract chars, words, lines, rects, and tables from PDF documents with precise coordinates

v0.2.0 #pdf #table #text-extraction #document
pdf_oxide_cli

CLI for pdf-oxide — the fastest PDF toolkit. 22 commands: text extraction, PDF to markdown, search, merge, split, images, compress, encrypt, watermark, forms, and more.

v0.3.21 #pdf #text-extraction #pdf-to-markdown #pdf-toolkit #cli-toolkit
papyrus-core

PDF-to-Markdown conversion engine with smart heading detection, bold/italic text extraction, and CommonMark output. Pure Rust, best-effort parsing for corrupted PDFs.

v0.1.0 #markdown #convert #pdf #text-extraction #extract-text
docx-lite

Lightweight, fast DOCX text extraction library with minimal dependencies

v0.2.0 1.4K #docx #text-extraction #parser #word #office
elizaos-plugin-pdf

elizaOS PDF Plugin - PDF reading and text extraction

v2.0.0 #pdf #elizaos #document-processing #text-extraction
unpdf-cli

CLI tool for extracting PDF content to Markdown, text, and JSON

v0.2.3 #markdown #pdf #text-extraction
parangi

PDF text extraction library — Rust port of Apache PDFBox

v0.1.0 #pdf #text-extraction #pdfbox
heavy-pdf-parser

Extract text from PDF files with support for multiple output formats

v0.1.0 #pdf #text-extraction #document-processing #rust
epub-parser

extracting metadata, table of contents, text, cover, and images from EPUB files

v0.3.4 #ebook #epub #text-extraction #metadata #parser
parser-core

extracting text from various file formats including PDF, DOCX, XLSX, PPTX, images via OCR, and more

v0.1.3 120 #docx #text-parser #pdf #ocr #text-extraction
arabic_pdf_to_text

A CLI tool to convert Arabic PDFs to text using Google's Gemini API

v0.1.0 #gemini-api #pdf #arabic #text-extraction
justpdf-core

Pure Rust PDF engine — parsing, writing, compression, text extraction, encryption, digital signatures

v0.1.3 #pdf #signature #pdf-parser #text-extraction #compression

Try searching with DuckDuckGo.