Skip to content

reatlat/webchronicle

Repository files navigation

webChronicle

License: MIT Node.js Eleventy

A self-hosted web archiving tool that captures and explores snapshots of webpages over time—like the Wayback Machine, but as your own personal Time Machine.

Live Demo | Blog Post

Features

  • Capture website snapshots — Archive any website with a single command
  • Time travel through history — Browse previous versions of archived pages
  • Self-hosted & private — Your archives stay on your own infrastructure
  • Multiple sites support — Archive multiple domains in a single snapshot
  • Recursive scraping — Automatically follow links to capture entire sites
  • Overlay navigation — Injected UI shows archive date and quick navigation

Quick Start

# Clone and install
git clone https://github.com/reatlat/webchronicle.git
cd webchronicle
npm install

# Configure your URLs in webchronicle.config.js, then:
npm run scraper    # Capture snapshots
npm run start      # Start local server at http://localhost:8080

Configuration

Edit webchronicle.config.js to specify which websites to archive:

export default {
  urls: [
    'https://example.com',
    'https://example.org',
  ],
  recursive: true,
  maxRecursiveDepth: 3,
  urlFilter: (url) => {
    return url.startsWith('https://example.com') || url.startsWith('https://example.org');
  },
};

Full configuration options available in the website-scraper documentation.

Usage

  1. Configure your target URLs in webchronicle.config.js
  2. Run the scraper to capture snapshots:
    npm run scraper
  3. Commit your archives:
    git add ./scraped-websites
    git commit -m "Add website snapshots"
    git push
  4. Deploy or run locally:
    npm run start      # Development
    npm run build      # Production build

Snapshot Structure

Each snapshot is stored with a timestamp and organized by domain:

scraped-websites/
├── 2024-12-01T01-41-35/
│   ├── example.com/
│   └── example.org/
└── ledger.json

Deployment

Deploy to your preferred platform:

Deploy to Netlify Deploy to Vercel

Also compatible with Cloudflare Pages, AWS, Heroku, and Google Cloud.

Tech Stack

Contributing

  1. Fork this repo
  2. Clone: git clone git@github.com:YOUR_USERNAME/webchronicle.git
  3. Create your feature branch: git checkout -b my-new-feature
  4. Commit your changes: git commit -am 'Add some feature'
  5. Push to the branch: git push origin my-new-feature
  6. Create a Pull Request

Found a bug? Open an issue.

Credits

Special thanks to James Dancer for the inspiration behind the name.

Logo design by Tatiana Zappa.

License

This project is open source and available under the MIT License.

About

A web archiving tool that allows you to capture and explore snapshots of webpages over time—like the Wayback Machine, but as your own personal Time Machine.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors