-
pdf_oxide
The fastest Rust PDF library with text extraction: 0.8ms mean, 100% pass rate on 3,830 PDFs. 5× faster than pdf_extract, 17× faster than oxidize_pdf. Extract, create, and edit PDFs.
-
deformat
Extract plain text from HTML, PDF, and other document formats
-
unpdf
High-performance PDF content extraction to Markdown, text, and JSON
-
bolivar-cli
PDF text extraction CLI tools
-
pdf-text-extract
Extract text, tables, and structured content from PDF files
-
wikipedia-article-transform
Transform Wikipedia articles in html to plaintext and markdown formats
-
folio-pdf
A comprehensive PDF library for Rust
-
keyword_extraction
Collection of algorithms for keyword extraction from text
-
kawat
Web content extraction library inspired by trafilatura. Extracts main text, metadata, and comments from HTML.
-
lopdf-parang
A fork of lopdf optimized for PDF text extraction — lazy streams, O(1) object slicing, zlib-rs
-
pdfplumber-cli
Command-line tool to extract text, characters, words, and tables from PDF documents
-
llm-text
processing text for LLM consumption
-
justpdf
Pure Rust PDF engine - read, render, extract, create, modify
-
rpdfium
A faithful Rust port of Google's PDFium PDF rendering engine
-
hwarang
Fast HWP document text extractor
-
pdfvec
High-performance PDF text extraction library for vectorization pipelines
-
pdfplumber
Extract chars, words, lines, rects, and tables from PDF documents with precise coordinates
-
pdf_oxide_cli
CLI for pdf-oxide — the fastest PDF toolkit. 22 commands: text extraction, PDF to markdown, search, merge, split, images, compress, encrypt, watermark, forms, and more.
-
papyrus-core
PDF-to-Markdown conversion engine with smart heading detection, bold/italic text extraction, and CommonMark output. Pure Rust, best-effort parsing for corrupted PDFs.
-
docx-lite
Lightweight, fast DOCX text extraction library with minimal dependencies
-
elizaos-plugin-pdf
elizaOS PDF Plugin - PDF reading and text extraction
-
unpdf-cli
CLI tool for extracting PDF content to Markdown, text, and JSON
-
parangi
PDF text extraction library — Rust port of Apache PDFBox
-
heavy-pdf-parser
Extract text from PDF files with support for multiple output formats
-
epub-parser
extracting metadata, table of contents, text, cover, and images from EPUB files
-
parser-core
extracting text from various file formats including PDF, DOCX, XLSX, PPTX, images via OCR, and more
-
arabic_pdf_to_text
A CLI tool to convert Arabic PDFs to text using Google's Gemini API
-
justpdf-core
Pure Rust PDF engine — parsing, writing, compression, text extraction, encryption, digital signatures
-
pdfplumber-core
Core data types and algorithms for pdfplumber-rs (backend-independent)
-
the-daily-stallman
Read the news like Stallman would. No JavaScript required.
Try searching with DuckDuckGo.