Web Parser + Search

This web page parser library allows asynchronous fetching and extracting of data from web pages in multiple formats.

Asynchronous web search using the search engines [Google, Bing, Duck, Ecosia, Yahoo, Wiki] with domain blacklisting (feature search).
You can also create a custom search engine by using the SearchEngine trait (feature search).
Reading an HTML document from a URL with a randomized user-agent (User::random()).
Selecting elements by CSS selectors and retrieving their attributes and content.
Fetching the full page as plain text.
Fetching and parsing page content as JSON with serde_json support.

This tool is well-suited for web scraping and data extraction tasks, offering flexible parsing of HTML, plain text, and JSON to enable comprehensive data gathering from various web sources.

Examples:

Web Search (feature: 'search'):

Requires the chromedriver tool installed!

use web_parser::prelude::*;
use macron::path;

#[tokio::main]
async fn main() -> Result<()> {
    // WEB SEARCH:

    let chrome_path = path!("bin/chromedriver/chromedriver.exe");
    let session_path = path!("%/ChromeDriver/WebSearch");
    
    // start search engine:
    let mut engine = SearchEngine::<Duck>::new(
        chrome_path,
        Some(session_path),
        false,
    ).await?;

    println!("Searching results..");

    // send search query:
    let results = engine.search(
        "Rust (programming language)",  // query
        &["support.google.com", "youtube.com"],  // black list
        1000  // sleep in millis
    ).await;
    
    // handle search results:
    match results {
        Ok(cites) => {
            println!("Result cites list: {:#?}", cites.get_urls());

            /*
            println!("Reading result pages..");
            let contents = cites.read(
                5,  // cites count to read
                &[  // tag name black list
                    "header", "footer", "style", "script", "noscript",
                    "iframe", "button", "img", "svg"
                ]
            ).await?;

            println!("Results: {contents:#?}");
            */
        }
        Err(e) => eprintln!("Search error: {e}")
    }

    // stop search engine:
    engine.stop().await?;

    Ok(())
}

Web Parsing:

use web_parser::prelude::*;

#[tokio::main]
async fn main() -> Result<()> {
    // READ PAGE AS HTML DOCUMENT:
    
    // read website page:
    let mut doc = Document::read("https://example.com/", User::random()).await?;

    // select title:
    let title = doc.select("h1")?.expect("No elements found");
    println!("Title: '{}'", title.text());

    // select descriptions:
    let mut descrs = doc.select_all("p")?.expect("No elements found");
    
    while let Some(descr) = descrs.next() {
        println!("Description: '{}'", descr.text())
    }

    // READ PAGE AS PLAIN TEXT:

    let text: String = Document::text("https://example.com/", User::random()).await?;
    println!("Text: {text}");

    // READ PAGE AS JSON:

    let json: serde_json::Value = Document::json("https://example.com/", User::random()).await?.expect("Failed to parse JSON");
    println!("Json: {json}");

    Ok(())
}

Licensing:

Distributed under the MIT license.

Feedback:

You can find me here, also see my channel. I welcome your suggestions and feedback!

Copyright (c) 2025 Bulat Sh. (fuderis)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
bin/chromedriver		bin/chromedriver
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Parser + Search

Examples:

Web Search (feature: 'search'):

Web Parsing:

Licensing:

Feedback:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Parser + Search

Examples:

Web Search (feature: 'search'):

Web Parsing:

Licensing:

Feedback:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages