4 releases
| 0.2.1 | Feb 7, 2026 |
|---|---|
| 0.2.0 | Feb 7, 2026 |
| 0.1.2 | Feb 6, 2026 |
| 0.1.1 | Feb 6, 2026 |
#1838 in Web programming
66KB
935 lines
ChadSelect
One query. Any format. Every selector.
Unified data extraction — Regex, XPath 1.0, CSS Selectors, and JMESPath behind one query interface. Load your content, prefix your query, get results. Never panics.
use chadselect::ChadSelect;
let mut cs = ChadSelect::new();
cs.add_html(r#"<span class="price">$49.99</span>"#.to_string());
let price = cs.select(0, "css:.price");
assert_eq!(price, "$49.99");
Install
[dependencies]
chadselect = "0.2.1"
Query Syntax
Every query uses an engine:expression prefix. No prefix defaults to regex.
| Prefix | Engine | Content Types | Backed By |
|---|---|---|---|
css: |
CSS Selectors | HTML | scraper |
xpath: |
XPath 1.0 | HTML, Text | sxd-xpath |
regex: |
Regular Expressions | All | regex |
json: |
JMESPath | JSON | jmespath |
The index Parameter
Every query method takes an index argument that controls which match to return:
| Value | Behavior |
|---|---|
-1 |
Return all matches across every loaded document |
0 |
Return only the first match |
N |
Return only the Nth match (0-based) |
let mut cs = ChadSelect::new();
cs.add_html(r#"<ul><li>A</li><li>B</li><li>C</li></ul>"#.to_string());
let all = cs.query(-1, "css:li"); // vec!["A", "B", "C"]
let first = cs.query(0, "css:li"); // vec!["A"]
let third = cs.query(2, "css:li"); // vec!["C"]
let oob = cs.query(99, "css:li"); // vec![] (out of bounds — never panics)
// select() wraps query() — returns a single String
let s = cs.select(0, "css:li"); // "A"
let s = cs.select(-1, "css:li"); // "A" (first of all matches)
When multiple documents are loaded, -1 aggregates results from all compatible documents before indexing.
Content Management
Load one or more documents. Each document is tagged by type and only queried by compatible engines.
use chadselect::ChadSelect;
let mut cs = ChadSelect::new();
// HTML — compatible with css:, xpath:, regex:
cs.add_html(r#"
<html>
<body>
<h1 class="title">2024 Honda Civic</h1>
<span class="price">$28,500</span>
<div class="details">
<div class="item"><span class="label">VIN:</span> 1HGFE2F59PA000001</div>
<div class="item"><span class="label">Exterior:</span> Blue Metallic</div>
<div class="item"><span class="label">Interior:</span> Black Leather</div>
<div class="item"><span class="label">Mileage:</span> 12,345 mi</div>
</div>
<a class="dealer-link" href="https://example.com/dealer/42">View Dealer</a>
</body>
</html>
"#.to_string());
// JSON — compatible with json:, regex:
cs.add_json(r#"{
"inventory": [
{"id": 1, "name": "Civic", "price": 28500, "tags": ["sedan", "honda"]},
{"id": 2, "name": "Accord", "price": 34000, "tags": ["sedan", "honda"]},
{"id": 3, "name": "CR-V", "price": 32500, "tags": ["suv", "honda"]}
],
"dealer": {"name": "Metro Honda", "rating": 4.8}
}"#.to_string());
// Plain text — compatible with regex:, xpath:
cs.add_text("Order #12345 confirmed. Total: $99.50".to_string());
assert_eq!(cs.content_count(), 3);
cs.clear(); // remove all content
CSS Selectors
Standard CSS selectors, plus custom text pseudo-selectors for scraping.
let mut cs = ChadSelect::new();
cs.add_html(r#"
<ul class="products">
<li class="product" data-id="1"><span class="name">Widget</span><span class="price">$19.99</span></li>
<li class="product" data-id="2"><span class="name">Gadget</span><span class="price">$49.99</span></li>
<li class="product" data-id="3"><span class="name">Doohickey</span><span class="price">$9.99</span></li>
</ul>
"#.to_string());
// Basic selectors
let first_name = cs.select(0, "css:.product .name");
assert_eq!(first_name, "Widget");
// All matches — index -1
let all_prices = cs.query(-1, "css:.product .price");
assert_eq!(all_prices, vec!["$19.99", "$49.99", "$9.99"]);
// Nth match — index 2 (0-based)
let third = cs.query(2, "css:.product .name");
assert_eq!(third, vec!["Doohickey"]);
// Attribute extraction via get-attr()
let id = cs.select(0, "css:.product >> get-attr('data-id')");
assert_eq!(id, "1");
Text Pseudo-Selectors
These work like Playwright's pseudo-selectors — match elements by text content.
| Pseudo-Selector | Behavior |
|---|---|
:has-text('x') |
Element or its descendants contain the text |
:contains-text('x') |
Element's own text contains the text |
:text-equals('x') |
Element's text exactly equals |
:text-starts('x') |
Element's text starts with |
:text-ends('x') |
Element's text ends with |
let mut cs = ChadSelect::new();
cs.add_html(r#"
<div class="specs">
<div class="row"><span class="label">Exterior</span><span class="value">Blue Metallic</span></div>
<div class="row"><span class="label">Interior</span><span class="value">Black Leather</span></div>
<div class="row"><span class="label">Engine</span><span class="value">2.0L Turbo</span></div>
</div>
"#.to_string());
// :has-text — matches the .row whose subtree contains "Exterior"
let color = cs.select(0, "css:.row:has-text('Exterior') .value");
assert_eq!(color, "Blue Metallic");
// :text-equals — exact match on element text
let engine_label = cs.select(0, "css:.label:text-equals('Engine')");
assert_eq!(engine_label, "Engine");
// :text-starts — prefix match
let starts_e = cs.select(0, "css:.label:text-starts('Ext')");
assert_eq!(starts_e, "Exterior");
// :text-ends — suffix match
let ends_or = cs.select(0, "css:.label:text-ends('ior')");
assert_eq!(ends_or, "Exterior");
// Combine with function piping
let upper_interior = cs.select(0, "css:.row:has-text('Interior') .value >> uppercase()");
assert_eq!(upper_interior, "BLACK LEATHER");
XPath 1.0
Full XPath 1.0 support including axes, predicates, and XPath functions.
let mut cs = ChadSelect::new();
cs.add_html(r#"
<html>
<body>
<h1 id="title"> 2024 Honda Civic </h1>
<table class="specs">
<tr><td>VIN</td><td>1HGFE2F59PA000001</td></tr>
<tr><td>Price</td><td>$28,500</td></tr>
<tr><td>Mileage</td><td>12,345 mi</td></tr>
</table>
</body>
</html>
"#.to_string());
// text() extraction
let title = cs.select(0, "xpath://h1[@id='title']/text()");
assert_eq!(title, " 2024 Honda Civic ");
// With normalize-space
let clean_title = cs.select(0, "xpath:normalize-space(//h1[@id='title'])");
assert_eq!(clean_title, "2024 Honda Civic");
// Predicate-based selection — find the <td> after "VIN"
let vin = cs.select(0, "xpath://tr[td='VIN']/td[2]/text()");
assert_eq!(vin, "1HGFE2F59PA000001");
// All values from the second column
let all_values = cs.query(-1, "xpath://table[@class='specs']//tr/td[2]/text()");
assert_eq!(all_values, vec!["1HGFE2F59PA000001", "$28,500", "12,345 mi"]);
// XPath string() on attribute
let title_id = cs.select(0, "xpath:string(//h1/@id)");
assert_eq!(title_id, "title");
Regex
Capture groups or full matches. Works on HTML, JSON, and plain text content.
let mut cs = ChadSelect::new();
cs.add_text("VIN: 1HGFE2F59PA000001 | Stock #: A12345 | Price: $28,500".to_string());
// Capture group — returns the group, not the full match
let vin = cs.select(0, r"regex:VIN:\s*([A-HJ-NPR-Z0-9]{17})");
assert_eq!(vin, "1HGFE2F59PA000001");
// Full match — no capture group
let stock = cs.select(0, r"regex:Stock #:\s*\S+");
assert_eq!(stock, "Stock #: A12345");
// Multiple capture groups — returns first group
let price_digits = cs.select(0, r"regex:Price:\s*\$([0-9,]+)");
assert_eq!(price_digits, "28,500");
// All matches
let all_numbers = cs.query(-1, r"regex:\d+");
// Returns all digit sequences found in the text
// No prefix — defaults to regex
let vin2 = cs.select(0, r"[A-HJ-NPR-Z0-9]{17}");
assert_eq!(vin2, "1HGFE2F59PA000001");
Regex on HTML
Regex runs on the raw HTML string, not parsed text — useful for extracting from attributes, comments, or script tags.
let mut cs = ChadSelect::new();
cs.add_html(r#"<script>var price = 28500;</script>"#.to_string());
let price = cs.select(0, r"regex:var price\s*=\s*(\d+)");
assert_eq!(price, "28500");
JMESPath (JSON)
Full JMESPath expression support for structured JSON extraction.
let mut cs = ChadSelect::new();
cs.add_json(r#"{
"inventory": [
{"id": 1, "name": "Civic", "price": 28500, "tags": ["sedan", "honda"]},
{"id": 2, "name": "Accord", "price": 34000, "tags": ["sedan", "honda"]},
{"id": 3, "name": "CR-V", "price": 32500, "tags": ["suv", "honda"]}
],
"dealer": {"name": "Metro Honda", "rating": 4.8}
}"#.to_string());
// Simple field access
let dealer = cs.select(0, "json:dealer.name");
assert_eq!(dealer, "Metro Honda");
// Array indexing
let first = cs.select(0, "json:inventory[0].name");
assert_eq!(first, "Civic");
// Projection — all names
let names = cs.query(-1, "json:inventory[*].name");
assert_eq!(names, vec!["Civic", "Accord", "CR-V"]);
// Filter expression
let expensive = cs.query(-1, "json:inventory[?price > `30000`].name");
assert_eq!(expensive, vec!["Accord", "CR-V"]);
// Nested access
let rating = cs.select(0, "json:dealer.rating");
assert_eq!(rating, "4.8");
// Flatten nested arrays
let all_tags = cs.query(-1, "json:inventory[*].tags[]");
assert_eq!(all_tags, vec!["sedan", "honda", "sedan", "honda", "suv", "honda"]);
Post-Processing Functions
Pipe results through text transformations using >>. This operator was chosen over | because | is reserved by XPath (union) and JMESPath (pipe).
css:.selector >> function1() >> function2()
xpath://path/text() >> trim() >> uppercase()
regex:pattern >> replace('$', 'USD ')
| Function | Description | Example |
|---|---|---|
normalize-space() |
Trim + collapse internal whitespace | css:.desc >> normalize-space() |
trim() |
Trim leading/trailing whitespace | css:.title >> trim() |
uppercase() |
Convert to UPPER CASE | css:.vin >> uppercase() |
lowercase() |
Convert to lower case | css:.name >> lowercase() |
substring(start, len) |
Extract substring (0-based) | css:.code >> substring(0, 3) |
substring-after('delim') |
Text after first delimiter | css:.info >> substring-after('VIN: ') |
substring-before('delim') |
Text before first delimiter | css:.info >> substring-before(': ') |
replace('find', 'repl') |
Replace all occurrences | css:.price >> replace('$', 'USD ') |
get-attr('name') |
Element attribute (CSS only) | css:a.link >> get-attr('href') |
Chaining Functions
Functions execute left-to-right. Empty results are filtered after each step.
let mut cs = ChadSelect::new();
cs.add_html(r#"<div class="info"> VIN: 1HGFE2F59PA000001 </div>"#.to_string());
// Chain: extract text → get everything after "VIN: " → first 3 chars → lowercase
let result = cs.select(0, "css:.info >> substring-after('VIN: ') >> substring(0, 3) >> lowercase()");
assert_eq!(result, "1hg");
let mut cs = ChadSelect::new();
cs.add_html(r#"<a class="link" href="/https/lib.rs/inventory/123">View Car</a>"#.to_string());
// Attribute extraction
let href = cs.select(0, "css:a.link >> get-attr('href')");
assert_eq!(href, "/inventory/123");
let mut cs = ChadSelect::new();
cs.add_html(r#"<span class="price"> $ 28,500 </span>"#.to_string());
// Clean + transform
let clean_price = cs.select(0, "css:.price >> normalize-space() >> replace('$ ', '$')");
assert_eq!(clean_price, "$28,500");
API Reference
Core Query Methods
use chadselect::ChadSelect;
let mut cs = ChadSelect::new();
cs.add_html(html);
// query() — returns Vec<String>, never panics
let all_matches = cs.query(-1, "css:.price"); // all results
let first_only = cs.query(0, "css:.price"); // vec with 1st result or empty
let third = cs.query(2, "css:.price"); // vec with 3rd result or empty
// select() — returns String, empty on no match
let price = cs.select(0, "css:.price"); // first valid result or ""
Fallback Chains — select_first
Try queries in priority order. Returns the first result set where all values pass validation.
let mut cs = ChadSelect::new();
cs.add_html(r#"<span class="alt-price">$28,500</span>"#.to_string());
// #exact-id doesn't exist, falls through to .alt-price
let result = cs.select_first(vec![
(0, "css:#exact-id"),
(0, "css:.alt-price"),
(0, r"regex:\$[\d,]+"),
]);
assert_eq!(result, vec!["$28,500"]);
Multi-Source — select_many
Combine unique results from multiple queries.
let mut cs = ChadSelect::new();
cs.add_html(r#"
<span class="msrp">$30,000</span>
<span class="sale">$28,500</span>
"#.to_string());
let prices = cs.select_many(vec![
(0, "css:.msrp"),
(0, "css:.sale"),
]);
// Contains both "$30,000" and "$28,500" (unique, unordered)
assert!(prices.contains(&"$30,000".to_string()));
assert!(prices.contains(&"$28,500".to_string()));
Custom Validators — select_where
Filter results with a closure. The _where variants exist for select, select_first, and select_many.
let mut cs = ChadSelect::new();
cs.add_html(r#"<span class="price">0</span><span class="price">28500</span>"#.to_string());
// Reject "0" as a valid price
let price = cs.select_where(0, "css:.price", |s| s != "0");
assert_eq!(price, ""); // first match "0" rejected, no fallback within select_where
// With select_first_where — falls through to next query
let mut cs2 = ChadSelect::new();
cs2.add_text("a: 0\nb: 42".to_string());
let r = cs2.select_first_where(
vec![(0, r"a: (\d+)"), (0, r"b: (\d+)")],
|s| s != "0",
);
assert_eq!(r, vec!["42"]);
Batch Queries — query_batch
Execute many queries in one call. Returns Vec<Vec<String>> in input order.
let mut cs = ChadSelect::new();
cs.add_html(r#"<h1>Civic</h1><span class="price">$28,500</span>"#.to_string());
cs.add_json(r#"{"dealer": "Metro Honda"}"#.to_string());
let results = cs.query_batch(&[
(0, "css:h1"),
(0, "css:.price"),
(0, "json:dealer"),
]);
assert_eq!(results[0], vec!["Civic"]);
assert_eq!(results[1], vec!["$28,500"]);
assert_eq!(results[2], vec!["Metro Honda"]);
Multi-Content Queries
When multiple documents are loaded, queries search across all compatible content. Use query(-1, ...) to get results from every document.
let mut cs = ChadSelect::new();
cs.add_html(r#"<span class="title">Page 1</span>"#.to_string());
cs.add_html(r#"<span class="title">Page 2</span>"#.to_string());
// Searches both HTML documents
let titles = cs.query(-1, "css:.title");
assert_eq!(titles, vec!["Page 1", "Page 2"]);
// Mixing content types
cs.add_json(r#"{"title": "JSON Title"}"#.to_string());
// css: only queries HTML content — JSON is skipped
let html_titles = cs.query(-1, "css:.title");
assert_eq!(html_titles, vec!["Page 1", "Page 2"]);
// json: only queries JSON content
let json_title = cs.select(0, "json:title");
assert_eq!(json_title, "JSON Title");
// regex: searches everything
let all = cs.query(-1, r"regex:(?:Page \d|JSON Title)");
assert_eq!(all.len(), 3);
Error Handling
ChadSelect never panics. Every invalid query, malformed content, or out-of-bounds index returns empty results.
let mut cs = ChadSelect::new();
cs.add_html("<div>hello</div>".to_string());
// Invalid CSS selector — returns ""
let r = cs.select(0, "css:][invalid");
assert_eq!(r, "");
// Out of bounds index — returns empty vec
let r = cs.query(999, "css:div");
assert_eq!(r, Vec::<String>::new());
// Wrong engine for content type — returns ""
cs.add_json(r#"{"a": 1}"#.to_string());
let r = cs.select(0, "css:.something"); // css: doesn't apply to JSON
// Only the HTML is searched, no ".something" found → ""
Design Principles
- Never panic — invalid queries, malformed content, and out-of-bounds indices all return empty results
- Prefix routing — the query string declares the engine; no mode switching or builder patterns
>>function pipe — unambiguous across all engines; XPath|and JMESPath|work natively- Batteries included — post-processing, text pseudo-selectors, validators, and index selection are all built in
Also Available
ChadSelect is also available as a Python package with identical API and query syntax.
License
MIT
Dependencies
~6–8.5MB
~162K SLoC