This web page parser library allows asynchronous fetching and extracting of data from web pages in multiple formats.
- Asynchronous web search using the search engines [Google, Bing, Duck, Ecosia, Yahoo, Wiki] with domain blacklisting (feature
search). - You can also create a custom search engine by using the
SearchEnginetrait (featuresearch). - Reading an HTML document from a URL with a randomized user-agent (User::random()).
- Selecting elements by CSS selectors and retrieving their attributes and content.
- Fetching the full page as plain text.
- Fetching and parsing page content as JSON with serde_json support.
This tool is well-suited for web scraping and data extraction tasks, offering flexible parsing of HTML, plain text, and JSON to enable comprehensive data gathering from various web sources.
Requires the chromedriver tool installed!
use web_parser::prelude::*;
use macron::path;
#[tokio::main]
async fn main() -> Result<()> {
// WEB SEARCH:
let chrome_path = path!("bin/chromedriver/chromedriver.exe");
let session_path = path!("%/ChromeDriver/WebSearch");
// start search engine:
let mut engine = SearchEngine::<Duck>::new(
chrome_path,
Some(session_path),
false,
).await?;
println!("Searching results..");
// send search query:
let results = engine.search(
"Rust (programming language)", // query
&["support.google.com", "youtube.com"], // black list
1000 // sleep in millis
).await;
// handle search results:
match results {
Ok(cites) => {
println!("Result cites list: {:#?}", cites.get_urls());
/*
println!("Reading result pages..");
let contents = cites.read(
5, // cites count to read
&[ // tag name black list
"header", "footer", "style", "script", "noscript",
"iframe", "button", "img", "svg"
]
).await?;
println!("Results: {contents:#?}");
*/
}
Err(e) => eprintln!("Search error: {e}")
}
// stop search engine:
engine.stop().await?;
Ok(())
}use web_parser::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
// READ PAGE AS HTML DOCUMENT:
// read website page:
let mut doc = Document::read("https://example.com/", User::random()).await?;
// select title:
let title = doc.select("h1")?.expect("No elements found");
println!("Title: '{}'", title.text());
// select descriptions:
let mut descrs = doc.select_all("p")?.expect("No elements found");
while let Some(descr) = descrs.next() {
println!("Description: '{}'", descr.text())
}
// READ PAGE AS PLAIN TEXT:
let text: String = Document::text("https://example.com/", User::random()).await?;
println!("Text: {text}");
// READ PAGE AS JSON:
let json: serde_json::Value = Document::json("https://example.com/", User::random()).await?.expect("Failed to parse JSON");
println!("Json: {json}");
Ok(())
}Distributed under the MIT license.
You can find me here, also see my channel. I welcome your suggestions and feedback!
Copyright (c) 2025 Bulat Sh. (fuderis)