Skip to content
Home » Website Change Detector: Track HTML & Visual Updates

Website Change Detector: Track HTML & Visual Updates



For: Me, Myself and I

Project Goal

This project aims to monitor web pages for changes, detecting both modifications to the HTML structure and visual differences via screenshots. It provides detailed reports, including an HTML diff, to keep track of website updates.

The scraper is deployed on Apify, check it out here!

How it was built

The core functionality involves several key steps:

  1. HTML Fetching: Uses httpx to retrieve the HTML content of specified URLs.
  2. HTML Signature Generation: Creates a unique hash of the HTML content by removing dynamic attributes, comments, and normalizing whitespace to detect structural changes.
  3. Screenshot Capturing: Employs Selenium with a headless Chrome driver to capture full-page screenshots.
  4. Change Detection: Compares current and previous HTML signatures and screenshots to detect modifications. Generates HTML diffs using htmldiff2 library.
  5. Data Storage: Stores all results, including the URL, change flags, timestamps, HTML diffs, and screenshots, using Apify’s key-value store.

Technologies used

  • Python
  • asyncio
  • httpx
  • Selenium
  • Apify SDK
  • htmldiff2
  • hashlib

Leave a Reply

Your email address will not be published. Required fields are marked *