1 release (0 unstable)
Uses new Rust 2024
| 1.0.0-beta.1 | Nov 26, 2025 |
|---|
#3 in #content-length
Used in s5_cli
295KB
5.5K
SLoC
s5_importer_http
Imports content from HTTP URLs into an S5 filesystem (FS5).
Features
- Recursive Crawling: Parses HTML to find links (
<a href="...">) and recursively imports them (within the base URL scope). - Incremental: Uses
HEADrequests to checkContent-LengthandLast-Modifiedbefore downloading. - Concurrency: Parallel processing of URLs.
Usage
use s5_importer_http::HttpImporter;
use url::Url;
let base_url = Url::parse("https://example.com/docs/")?;
let importer = HttpImporter::create(
fs,
blob_store,
4, // concurrency
base_url.clone(),
true, // use_base_relative_keys
)?;
importer.import_url(base_url).await?;
Dependencies
~26–48MB
~679K SLoC