A super basic web crawler prototype in Haskell.
- Fetches web pages via HTTP
- Extracts links from HTML
- Tracks visited URLs to avoid duplicates
- Depth-limited crawling
cabal build
cabal runOr with Stack:
stack build
stack runWhen you run the program, it will prompt for a starting URL. Enter any valid HTTP/HTTPS URL and it will crawl up to depth 2 (configurable in Main.hs).
Example:
Enter starting URL:
https://example.com
- http-conduit: HTTP client
- tagsoup: HTML parsing
- containers: Set data structure