Written in Rust. Built for LLMs.

Email preprocessing
for LLM consumption

Parse, clean, and structure raw emails into formats optimized for language models. Written in Rust. Available for Node.js and Python.

install
# Node.js
npm install langmail

# Python
pip install langmail

# Rust
cargo add langmail
What it does
#_
HTML to Markdown
Converts HTML email bodies to clean Markdown. Preserves semantic structure, strips tracking pixels and decorative elements.
Reply detection
Identifies and removes quoted content across Gmail, Outlook, Apple Mail, and non-standard clients. Works in multiple languages.
— —
Signature stripping
Detects and removes email signatures using heuristic pattern matching. No ML, no training data required.
CTA extraction
Surfaces calls-to-action by position analysis and structural patterns. Returns structured data your LLM can act on.

From raw email
to clean context

Two functions. One pipeline. Pass a raw RFC 5322 email string to preprocess, then serialize the result with toLLMContext. What comes out is clean Markdown — no noise, no quoted history, no signature clutter.

01
preprocess(raw) — parses the MIME structure, extracts the relevant body variant, strips reply chains and signatures.
02
toLLMContext(parsed) — serializes the structured result into a Markdown string, ready to drop into a prompt.
03
Pass the output directly to your LLM. No further cleaning needed.
TypeScript
Python
Rust
example.ts
import { preprocess, toLLMContext } from "langmail"
import { readFileSync } from "fs"

// raw RFC 5322 email string
const raw = readFileSync("email.eml", "utf8")

// parse and clean
const parsed = await preprocess(raw)

// serialize to LLM-ready Markdown
const context = toLLMContext(parsed)

// drop into your prompt
llm.complete({ messages: [
  { role: "user", content: context }
]})
context output
From: Alice <alice@example.com>
Subject: Q4 budget review
Date: 2024-11-12

Hi,

Following up on the Q4 numbers. Can you send
the updated forecast by Friday?

— [quoted reply removed] —
— [signature removed] —